Hi,
I have a tree, and there is a jet branch which is an array , since for some events there is no jet, so the jet branch is null for those events. I am wondering how to deal with it ?
I tried to read and plot it like common branch but there is an error . I guess it is because if there is no jet, then Rdataframe can not do the following step
histo[i].Add(rresultptrs[i].GetValue())
Hi,
what is the error, and what do you mean with null exactly? If the array is empty, but in your logic you access its elements anyway, you will probably get a segfault. In that case you can add a Filter("array.size() > 0") to your RDF to skip entries in which the array is empty.
Hi, thanks for your reply. The error is in the attachement. I mean for some events, there are jets, so that there are numbers store in the jet branch. But for other events with no jet, there is no number store in the jet branch, then it is null.
I will try to use your way, but I also would like to use
Alt$(primary,alternate) : return the value of “primary” if it is available for the current iteration otherwise return the value of “alternate”. For example, with arr1[3] and arr2[2]
to solve it.
I tried to define
rdf = rdf.Define(“jet_pt”,“ROOT.Alt$(jet_pt,0)”)
But it didn’t work. The error is
Traceback (most recent call last):
File “z_make_hlt_xsweight_muonsf_mc_reco_test_V2.py”, line 360, in
rdf = rdf.Define(“z_pt_1”,“ROOT.Alt$(z_pt,0)”)
TypeError: can not resolve method template call for ‘Define’
there is no number store in the jet branch, then it is null
An array with no elements is empty, not null. Pointers can be null. The difference is important because RDF can deal with empty arrays (e.g. with a Filter like mentioned above) but pointers are trickier.
I tried to define
rdf = rdf.Define(“jet_pt”,“ROOT.Alt$(jet_pt,0)”)
ROOT.Alt$ is not valid C++, so it’s not something that you can use in a Define. You can write something like this instead (assuming jet_pt is an array of floats):
The error means that, in the expression jet_pt.empty()? ROOT::RVec{0}: jet_pt[0] (where ROOT::RVec{0} should probably be ROOT::RVec<float>{0}), jet_pt[0] and ROOT::RVec{0} have incompatible types. Indeed if you only want the leading jet pt you can do:
Hi Entico,
Thanks for your reply.
Now I use
rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”)
And there is no error to run. But the distribution of the leading_jet_pt changes.
In the attachment you could see the difference. The blue line is plot from the original root file directly, the red line is plot by Rdataframe. Do you know why?
Cheers,
Jen
TTree::Draw uses a special syntax, not pure C++, and it does certain things for you under the hood such as skipping entries for which jet_pt[0] does not exist
RDataFrame’s Define only accepts valid C++ code
the manual loop (for(int i = 0; i < nentries; i++)) is wrong, it should be:
for(int i = 0; i<nentries;i++)
{
t1->GetEntry(i);
if (ngoodjets > 0) // and you need to SetBranchAddress("ngoodjets", ...)
h1->Fill(jet_pt[0]);
}
I can’t say why rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”) and rdf = rdf.Define(“leading_jet_pt”,“ngoodjets>0 ? jet_pt[0] : 0”) give you the same plot but if you use TTree::Draw the plot is different. Do you expect ngoodjets > 0 == jet_pt.size() > 0 for every entry? You can verify whether that’s the case in your file, e.g. with rdf.Filter("ngoodjets > 0 != jet_pt.size() > 0").Count().GetValue() <-- this should be zero if jet_pt.size() > 0 and ngoodjets > 0 are really equivalent.
And then I find Draw(“jet_pt.size()”) is draw jet_pt actually, which means “.size()” doesn’t work here.
And Draw(“Length$(jet_pt)”) plot the same with Draw(“ngoodjets”).
So I think the reason
why rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”) and rdf = rdf.Define(“leading_jet_pt”,“ngoodjets>0 ? jet_pt[0] : 0”) give the same plot but if using TTree::Draw the plot is different.
is because
jet_pt.size() doesn’t work in TTree::Draw, but works in RDataFrame’s Define. But I don’t know why.
Yep, that’s it then. As I mentioned, TTree::Draw does not support arbitrary C++ expressions but it uses a special syntax. According to the docs @jet_pt.size() or similar might work.