How to deal with a null branch by Rdataframe

Hi,
I have a tree, and there is a jet branch which is an array , since for some events there is no jet, so the jet branch is null for those events. I am wondering how to deal with it ?
I tried to read and plot it like common branch but there is an error . I guess it is because if there is no jet, then Rdataframe can not do the following step
histo[i].Add(rresultptrs[i].GetValue())

Is there any solution?

Cheers,
Jenny

Hi,
what is the error, and what do you mean with null exactly? If the array is empty, but in your logic you access its elements anyway, you will probably get a segfault. In that case you can add a Filter("array.size() > 0") to your RDF to skip entries in which the array is empty.

Hope this helps,
Enrico

Hi, thanks for your reply. The error is in the attachement. I mean for some events, there are jets, so that there are numbers store in the jet branch. But for other events with no jet, there is no number store in the jet branch, then it is null.
I will try to use your way, but I also would like to use

Alt$(primary,alternate) : return the value of “primary” if it is available for the current iteration otherwise return the value of “alternate”. For example, with arr1[3] and arr2[2]

to solve it.

I tried to define

rdf = rdf.Define(“jet_pt”,“ROOT.Alt$(jet_pt,0)”)

But it didn’t work. The error is

Traceback (most recent call last):
File “z_make_hlt_xsweight_muonsf_mc_reco_test_V2.py”, line 360, in
rdf = rdf.Define(“z_pt_1”,“ROOT.Alt$(z_pt,0)”)
TypeError: can not resolve method template call for ‘Define’

Do you know how to use Alt$ in RDataframe?

Hi,

there is no number store in the jet branch, then it is null

An array with no elements is empty, not null. Pointers can be null. The difference is important because RDF can deal with empty arrays (e.g. with a Filter like mentioned above) but pointers are trickier.

I tried to define

rdf = rdf.Define(“jet_pt”,“ROOT.Alt$(jet_pt,0)”)

ROOT.Alt$ is not valid C++, so it’s not something that you can use in a Define. You can write something like this instead (assuming jet_pt is an array of floats):

rdf = rdf.Define("jet_pt_alt", "jet_pt.empty() ? ROOT::RVec<float>{0} : jet_pt")

Or more simply you can use Filter("!jet_pt.empty()") to skip entries with no elements in jet_pt.

Cheers,
Enrico

Hi Enrico,
I tried

rdf = rdf.Define(“jet_pt_1”, “jet_pt.empty() ? ROOT::RVec{0} : jet_pt[0]”)

but it shows

input_line_121:2:22: error: incompatible operand types (‘ROOT::RVec’ and
‘__gnu_cxx::__alloc_traits<ROOT::Detail::VecOps::RAdoptAllocator, float>::value_type’ (aka ‘float’))
return jet_pt.empty()? ROOT::RVec{0}: jet_pt[0]
^ ~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~
Traceback (most recent call last):
File “z_make_hlt_xsweight_muonsf_mc_reco_test_V2.py”, line 362, in
rdf = rdf.Define(“leading_jet_pt”,“jet_pt.empty()? ROOT::RVec{0}: jet_pt[0]”)
TypeError: can not resolve method template call for ‘Define’

The error means that, in the expression jet_pt.empty()? ROOT::RVec{0}: jet_pt[0] (where ROOT::RVec{0} should probably be ROOT::RVec<float>{0}), jet_pt[0] and ROOT::RVec{0} have incompatible types. Indeed if you only want the leading jet pt you can do:

rdf.Define(“leading_jet_pt”,“jet_pt.empty()? 0.f : jet_pt[0]”)

i.e. no need to use RVec for a scalar value.

Cheers,
Enrico

Hi Entico,
Thanks for your reply.
Now I use
rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”)
And there is no error to run. But the distribution of the leading_jet_pt changes.
In the attachment you could see the difference. The blue line is plot from the original root file directly, the red line is plot by Rdataframe. Do you know why?
Cheers,
Jen

This is the difference.

Hi Enrico,
ok, I kind of know the reason.

rdf = rdf.Define("leading_jet_pt","jet_pt.size()>0 ? jet_pt[0] : 0.f") 

is right

the problem is I can not use the following way to deal with the original root file (ie. the blue line) :

TString inputfile1 = "./test.root";

TFile *file1 = new TFile(inputfile1);

TTree t1 = (TTree) file1->Get("Events");

Float_t jet_pt[10];

t1->SetBranchAddress("jet_pt",jet_pt);

TH1F *h1   = new TH1F("h1","h1",40,0,200);

int nentries = t1->GetEntries();
   
for(int i = 0; i<nentries;i++)
{
    t1->GetEntry(i);
    h1->Fill(jet_pt[0]);
}
 h1->Draw;

if doing this way, then it will fill the null jet with the previous number.

Another thing is,

I also tired in a new terminal like this:

root

TFile f("03F12777-8E13-C84E-85BA-3D92F2A24C7E_Skim.root")
Events->Draw("jet_pt[0]")
Events->Draw("jet_pt.size()>0 ? jet_pt[0] : 0")
Events->Draw("ngoodjets>0 ? jet_pt[0] : 0")

then I find

Draw("jet_pt[0]") is the same with Draw("ngoodjets>0 ? jet_pt[0] : 0"), but they are different from Events->Draw("jet_pt.size()>0 ? jet_pt[0] : 0")

The plots are in the attachment.

However, in Rdataframe,

rdf = rdf.Define("leading_jet_pt","jet_pt.size()>0 ? jet_pt[0] : 0.f")

is the same with

rdf = rdf.Define("leading_jet_pt","ngoodjets>0 ? jet_pt[0] : 0")

I am wondering why?

Cheers,
Jen

Hi Jen,
there are a bit too many moving pieces :smile:

  • TTree::Draw uses a special syntax, not pure C++, and it does certain things for you under the hood such as skipping entries for which jet_pt[0] does not exist
  • RDataFrame’s Define only accepts valid C++ code
  • the manual loop (for(int i = 0; i < nentries; i++)) is wrong, it should be:
for(int i = 0; i<nentries;i++)
{
  t1->GetEntry(i);
  if (ngoodjets > 0) // and you need to SetBranchAddress("ngoodjets", ...)
    h1->Fill(jet_pt[0]);
}

I can’t say why rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”) and rdf = rdf.Define(“leading_jet_pt”,“ngoodjets>0 ? jet_pt[0] : 0”) give you the same plot but if you use TTree::Draw the plot is different. Do you expect ngoodjets > 0 == jet_pt.size() > 0 for every entry? You can verify whether that’s the case in your file, e.g. with rdf.Filter("ngoodjets > 0 != jet_pt.size() > 0").Count().GetValue() <-- this should be zero if jet_pt.size() > 0 and ngoodjets > 0 are really equivalent.

Hope this helps!
Enrico

Hi Enrico,
Thanks for your help. Sorry for too questions :sweat_smile:
I do
x = rdf.Filter(“ngoodjets > 0 != jet_pt.size() > 0”).Count().GetValue()
print(x)

and then it shows
('x= ', 0L)

And if I do

x = rdf.Filter(“ngoodjets > 0 == jet_pt.size() > 0”).Count().GetValue()
print(x)

and then it shows
('x= ', 1094L)

1094 is the total entries . So I think we can go to the conclusion that ngoodjets > 0 == jet_pt.size() > 0` for every entry.

I also tired in a new terminal like this:
root

TFile f(“03F12777-8E13-C84E-85BA-3D92F2A24C7E_Skim.root”)
Events->Draw(“jet_pt.size()”)
Events->Draw(“jet_pt”)
Events->Draw(“Length$(jet_pt)”)
Events->Draw(“ngoodjets”)

And then I find Draw(“jet_pt.size()”) is draw jet_pt actually, which means “.size()” doesn’t work here.
And Draw(“Length$(jet_pt)”) plot the same with Draw(“ngoodjets”).

So I think the reason
why rdf = rdf.Define(“leading_jet_pt”,“jet_pt.size()>0 ? jet_pt[0] : 0.f”) and rdf = rdf.Define(“leading_jet_pt”,“ngoodjets>0 ? jet_pt[0] : 0”) give the same plot but if using TTree::Draw the plot is different.
is because

jet_pt.size() doesn’t work in TTree::Draw, but works in RDataFrame’s Define. But I don’t know why.

How do you think?

Cheers,
Jen

I agree, perfect!

Yep, that’s it then. As I mentioned, TTree::Draw does not support arbitrary C++ expressions but it uses a special syntax. According to the docs @jet_pt.size() or similar might work.

Cheers,
Enrico

Hi Enrico,
OK. @jet_pt.size()also doesn’t work. Is there some docs to search for it?


Jen

The docs for TTree::Draw are here

Hi Enrico,
OK, thanks a lot!
Cheers,
Jen

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.