Dear ROOT experts.
I’ve got a simple python
function that computes the sum of weights of a branch in a tree using RDataFrame
.
def MakeWeightedCorrelationMatrixRdf(inputFilePathList, treeName):
df = ROOT.ROOT.RDataFrame(treeName, inputFilePathList)
eventCountHandle = df.Count()
sumOfWeightsHandle = df.Sum("weightModified")
print(f"Event count: {eventCountHandle.GetValue()}")
print(f"Weight sum: {sumOfWeightsHandle.GetValue()}")
print()
I’m trying to run in on a different combination of 3 files A.root
, B.root
and C.root
.
print("A + B")
testFilePathList = [
'A.root',
'B.root',
]
MakeWeightedCorrelationMatrixRdf(testFilePathList, "tree_PFLOW")
print("A")
testFilePathList = [
'A.root',
]
MakeWeightedCorrelationMatrixRdf(testFilePathList, "tree_PFLOW")
print("B")
testFilePathList = [
'B.root',
]
MakeWeightedCorrelationMatrixRdf(testFilePathList, "tree_PFLOW")
print("A + C")
testFilePathList = [
'A.root',
'C.root',
]
MakeWeightedCorrelationMatrixRdf(testFilePathList, "tree_PFLOW")
print("B + C")
testFilePathList = [
'B.root',
'C.root',
]
MakeWeightedCorrelationMatrixRdf(testFilePathList, "tree_PFLOW")
For some reason, the combination of A.root
and B.root
yields a NaN
for the weight sum, even though both the individual files and their combination with C.root
are fine.
A + B
Event count: 1548011
Weight sum: nan
A
Event count: 1416179
Weight sum: 5326.039287298693
B
Event count: 131832
Weight sum: 91.8623456310427
A + C
Event count: 1733179
Weight sum: 5326.039287298693
B + C
Event count: 448832
Weight sum: 465.77165986367976
I’ve tried the same with the simple tree loop in python
and everything works as expected.
Function:
def GetWeightSum(inputFilePathList, treeName):
inputChain = ROOT.TChain(treeName)
for inputFilePath in inputFilePathList:
inputChain.Add(inputFilePath)
weightSum = 0
count = 0
for event in inputChain:
weightSum += event.weightModified
count += 1
Results:
A + B
Event count: 1548011
Weight sum: 5417.901632929743
A
Event count: 1416179
Weight sum: 5326.039287298693
B
Event count: 131832
Weight sum: 91.8623456310427
A + C
Event count: 1733179
Weight sum: 5699.948601531312
B + C
Event count: 448832
Weight sum: 465.77165986367976
The files are unfortunately private, but I can share them and the reproducers in PM.
What could’ve caused the issues with the RDataFrame
approach? I want to keep it as it is way faster and easier to work with.
Best regards,
Aleksandr
ROOT Version: 6.28.04
Platform: Ubuntu 20.04
Compiler: Precompiled