Issues with Triggers not present in all events in RDataFrame

Adam_Kobert · September 10, 2021, 6:13am

___Hello everyone, I have been having issues when running RDataFrame on 2018 CMS Data using PyROOT. One of the conditions for an event to be considered is that either the HLT_Photon175 or HLT_Photon110EB_TightID_TightIso triggers must be active. I use the RDF filter

Rdf = Rdf.Filter("(HLT_Photon110EB_TightID_TightIso > 0.0 || HLT_Photon175 > 0.0)")

to filter based on this requirement, which works as intended when I apply it to Monte Carlo trees. When I apply this trigger in Data however I get the error

Error in TTreeReaderValueBase::CreateProxy(): The tree does not have a branch called HLT_Photon110EB_TightID_TightIso. You could check with TTree::Print() for available branches.

I can confirm absolutely that HLT_Photon110EB_TightID_TightIso is present in the tree. Also strangely this error does not crash the program, though it does repeat itself an enormous number of times, I suspect once for each event for which the trigger is not present. First I suspected that the issue was due to the HLT_Photon110EB_TightID_TightIso being implemented partway into the 2018 run, so I implemented an earlier filter requiring the run# to be within the valid range for the trigger but the error persisted.

Is there a way to have RDF check if a branch is present before accessing it. Preferably I would be able to just store the value of HLT_Photon110EB_TightID_TightIso as 0 if it is not present since there are events which pass HLT_Photon175 outside the run range of HLT_Photon110EB_TightID_TightIso.

ROOT Version: 6.18/04
Python 2.7.15

eguiraud · September 10, 2021, 7:33am

Hi @Adam_Kobert ,
and welcome to the ROOT forum!

I see you are using ROOT v6.18/04: that is quite old in RDF terms, and it’s possible that the issue with reading those values has been fixed in subsequent versions – if possible I would suggest to try out ROOT v6.24.06 and see how things are there.

About the feature you ask about: there is currently no way in RDF to provide a placeholder value for branches that do not exist, I suggest you check which trees don’t have the branch and treat them separately.

Cheers,
Enrico

Adam_Kobert · September 13, 2021, 11:46pm

Yes that worked, I now have two separate directories of files, one with the branch, one where the branch is missing. If I created two separate RDFs for the different directories is there a way to combine RDFs together? If not then I will have to generate histograms separately then combine those.

eguiraud · September 27, 2021, 3:43pm

Hi @Adam_Kobert ,
sorry for the late reply, I was off work for a little while.

I think that’s the way to go, there is no general way to combine RDF together and, in general, RDF is strict about the dataset schema: if a branch exists, it needs to always exist, if it does not it can’t suddenly “appear” later in the dataset.

You might be interested in following this feature request about supporting missing columns more naturally.

Cheers,
Enrico

system · October 11, 2021, 3:44pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.