Accessing PID values with RDataFrame

Hello, I am wondering how to access the PID values of mother particles (M1) for the “Particle” branch within a Delphes tree when using RDataFrame.
In a Delphes macro I have found the following commands (the last one being the relevant one):

*branchGenParticle = treeReader->UseBranch("Particle");
for (Int_t entry = 0; entry < numberOfEntries; ++entry) {
    Long64_t nne = branchGenParticle->GetEntries ();
    GenParticle *ppl;
    for(int j=0; j<nne; j++) {
        ppl=(GenParticle*) branchGenParticle->At(j)
        if(fabs(((GenParticle*)branchGenParticle->At(ppl->M1))->PID)==23
            {...}}}

I tried accessing the value of PID with Particle.M1.PID (more specifically, I tried printing the values with df.Display({'Particle.M1.PID'}, 3).Print(), where df is the RDataFrame object created from the Delphes tree), but this is not being recognised. Perhaps with Particle.M1 I already have the PID?

Hi @yburkard ,

could it be that Particle.M1 is a branch, but Particle.M1.PID is not? Does Particle.M1.PID appear in the list of column names returned by df.GetColumnNames()? If not, then you want to access the Particle.M1 branch and then the PID data member of the M1 object. We can take advantage of RDF’s jitted strings to cheat a bit:

df.Define("pid", "Particle.M1.PID").Display({"pid"}, 3).Print();

Cheers,
Enrico

Thanks for the quick reply @eguiraud. Particle.M1 is indeed a branch and Particle.M1.PID is not. After executing the line you suggested, I obtain the following error:

input_line_67:2:64: error: no member named 'PID' in 'ROOT::VecOps::RVec<int>'
  ...var0){return var0.PID
                  ~~~~ ^
input_line_71:2:64: error: no member named 'PID' in 'ROOT::VecOps::RVec<int>'
  ...var0){return var0.PID
                  ~~~~ ^

Meaning that Particle.M1.PID is still not being recognized (although it works well with branchGenParticle->At(ppl->M1))->PID in ExRootTreeReader). Could it be that when transforming the root file to an RDataFrame object, the branch/column Particle.M1 already contains the PID values in it?

Uhm, there is a type mismatch, RDF thinks Particle.M1 is an array of integers. Does that make sense? What do df.GetColumnType("Particle.M1") and tree->Print() say about the Particle.M1 branch?

P.S.

feel free to share the input file if you can so I can take a look

With df.GetColumnType("Particle.M1")I get that the column type is ROOT::VecOps::RVec<Int_t>, whereas with tree.Print() I obtain:

*Br   33 :Particle.M1 : Int_t M1[Particle_]                                  *
*Entries :    10000 : Total  Size=   52973313 bytes  File Size  =   16981159 *
*Baskets :       82 : Basket Size=    1789952 bytes  Compression=   3.12     *

So RDF indeed thinks that Particle.M1 is an array with integers; but with tree.Print() I see that there are also Int_t type objects in this branch, no?
Here is the type of file I am working with:
ee_to_ll.root (951.0 KB)

Yes, Particle.M1 is an array of integers (that RDataFrame reads as the RVec<int> type), so there is no PID data member of course.

branchGenParticle->At(ppl->M1))->PID does something different: it’s taking the N-th element of branchGenParticle and then accessing its PID data member (where N is equal to ppl->M1).

Ok I see! I think I misunderstood then the code in C++, I am a bit unexperienced with this language; so essentially I can do the same in RDF with Particle.PID[Particle.M1[N]] right?

Something like that, yes :slight_smile:

1 Like

Ok great, thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.