RDF filter the content of branches

amartell · October 24, 2019, 2:24pm

Hello,

I am looking for a built-in/clever way to filter the content of some branches, when these are vectors (in addition to the standard filter on entries).

My input file (ntuple from nanoAOD CMS) has entries corresponding to different events.
Some branches contain vectors with the info on the available candidates for that event (i.e. RVec cand_pt, RVec cand_eta…)

I am interested in the selection of the candidates (i.e. filter those with cand_pt[i] > 2), thus reducing the branch vector size, and at the same time preserving the indexing coherence among the relevant branches ( cand_pt, cand_eta…).

I only found the .Any() .All() that filter on the event based on the candidate features,
so that I can reject the events with 0 interesting candidates, but I need to get rid also of the useless candidates.

I am running a test .C macro on lxplus, sourcing the 6.18.02 root version.)

thanks for any hint

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · October 24, 2019, 2:46pm

Hi,
typical strategies are Defineing new columns with your cand_pt, cand_eta, or Defineing a new column with the indexes of the elements you are interested in.

I think we need syntactic sugar and/or better paradigms to express these things, but I’m not quite sure what that would be (EDIT: yet).

Hope this helps!
Enrico

amartell · October 24, 2019, 3:06pm

Hi Enrico,
defining a new column with the index of the good candidates is already what I am doing,
I was indeed looking for a way to slim it.

thanks a lot for the cross-check!!
Arabella

eguiraud · October 24, 2019, 3:08pm

Let’s triple-check by pinging @swunsch too

swunsch · October 24, 2019, 3:49pm

Ping received

So here my favorite: A running NanoAOD example script Just copy it and run!

ROOT::RDataFrame df("Events", "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root");
// 1) We want actually only 2 muons but let's take 3 for our example
df.Filter("nMuon == 3")\
// 2) Let's run only on 10 good events
  .Range(10)\
// 3) I suppose that the leading and subleading muon are the good ones
  .Define("goodMuon_pt", "Take(Muon_pt, {0, 1})")\
// 4) Write the 10 events to a file
  .Snapshot("Events", "skim.root", "goodMuon_pt");

We select events with 3 muons (as example events) but our final selection is supposed to have 2 muons. So these goodMuons are selected by ROOT::VecOps::Take, creating a new collection goodMuon in a Define node.

Is this doing what you want to do? Don’t hesitate to reach out again if this is not solving your question!

Edit:

And ofc you can just make a column with sth like goodMuon_idx and then create new columns in a loop for goodMuon_pt and others. That would be the straight forward way to structure your computation graph.

Edit 2:

Aaand the corresponding tutorial for index manipulation with RVec here and here.

amartell · October 25, 2019, 8:49am

Ah great, this indeed does what I want.
Just tried it, and it works fine.

Thanks a lot!!

cheers
Arabella

system · November 8, 2019, 8:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.