RDataFrame and CreatePoxy/TTreeReaderValue error for selection on string branch in Filter method


Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.22.06


Dear experts,

I am trying to use the RDataFrame to produce histograms running on root files.
With the following code

import ROOT as RT 
"""
# setupATLAS and ROOT 6.22.06 
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
setupATLAS
. /cvmfs/sft.cern.ch/lcg/app/releases/ROOT/6.22.06/x86_64-centos7-gcc48-opt/bin/thisroot.sh

# Execute python script
python2 dataframe_vs_tree.py
"""

RT.gROOT.SetBatch(True) 
RT.EnableImplicitMT()

infileName    = "Zee_Sh221-0.root" 
treeName      = "Nominal"  
cutString     = "sample==\"Zbl\"" 
distName      = "MET" 
#weightName    = "EventWeight" 
binning       = [25,0,500]

histName = "h_{}".format(distName)

print("infileName = {}".format(infileName))
print("treeName   = {}".format(treeName))
print("cutString  = {}".format(cutString)) 

infile = RT.TFile.Open(infileName)
dataframe = RT.RDataFrame(treeName, infile)

histmodel = RT.RDF.TH1DModel(histName, histName, binning[0], binning[1], binning[2])

hist = dataframe.Filter(cutString).Histo1D(histmodel, distName)  
print("-----")
print("hist.GetEntries() = {}".format(hist.GetEntries()))
print("hist.GetSumOfWeights()= {}".format(hist.GetSumOfWeights()))
print("-----")

I attach the test file Zee_Sh221-0.root (14.9 KB) (which only contains 3 entries on purpose)

But I get the error
Error in <TTreeReaderValueBase::CreateProxy()>: The branch sample contains data of type string. It cannot be accessed by a TTreeReaderValue<string>

With bigger files I get many times this error and the number of entries and integral of the histogram hist is 0 while it should not be . Since for the current file doing a scan of the tree I have:

root -l Zee_Sh221-0.root 
root [0] 
Attaching file Zee_Sh221-0.root as _file0...
(TFile *) 0x4418460
root [1] Nominal->Scan("sample:MET")
************************************
*    Row   *    sample *       MET *
************************************
*        0 *        Zl * 158.14303 *
*        1 *       Zbl * 180.14964 *
*        2 *       Zbl * 253.89321 *
************************************

This error only occurs when the selection in Filter function includes a branch of type string.
If I have a selection only on double, float and so on without string branch everything is fine.

Would you know how to solve that ?
I also tried using python3 but same error occurs.
And using the corresponding TTree command works so it does not comes from the root file:

root -l Zee_Sh221-0.root
root [0] 
Attaching file Zee_Sh221-0.root as _file0...
root [1] TTree *tree = (TTree*) _file0->Get("Nominal") 
root [2] tree->Draw("MET>>h_MET(25,0,500)", "sample==\"Zbl\"")
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
(long long) 2

Many thanks in advance

Hi @rbouquet ,
sorry about that :confused: I can reproduce the issue and I’m looking into it.

It would be great if you could convert this post into a bug report at GitHub - root-project/root: The official repository for ROOT: analyzing, storing and visualizing big data, scientifically :grinning_face_with_smiling_eyes:

Cheers,
Enrico

Interestingly, removing the histmodel parameter from the call to Histo1D works around the problem – which makes zero sense to me at the moment, but in case it unblocks you…

Hi @eguiraud,
Thanks for looking into this and sure I am going to create an issue on github,
I will tag you

Something that I noticed also (similar to you)
is doing the following works (weighting the event)

hist = dataframe.Filter(cutString).Histo1D(histmodel, distName, "EventWeight")  

But same as you both what you and I observed does not make sense to me

How did you create the input file exactly? (it seems to be a problem with a dictionary mismatch, but only happens in PyROOT)

I created the issue on github

hum it’s a file produced by a huge code I have to look if I can reproduce a simpler example
I think it does not specify the type of the branch when saving it it relies on ROOT detecting automatically the string type
But it is strange that for TTree there is no error and just changing

hist = dataframe.Filter(cutString).Histo1D(histmodel, distName)

to

hist = dataframe.Filter(cutString).Histo1D(histmodel, distName, "EventWeight")

the error disapears

Sorry, we can now reproduce the problem with standalone code, no need to figure out how the file is produced :slight_smile: and thanks for opening the GitHub issue!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.