Drawing unique elements from a TChain

Dear experts,

I am trying to figure out a way to loop over “unique” events in a Draw statement. I define “unique” events to have a unique pair of values of “run” and “event” branches, for example. I know I can do it with a manual loop using std::set and constructing a TEventList, but I’m trying to avoid such explicit loops.

I naively expected TTreeIndex to handle this behind the scenes giving major and minor indices to run and event, but it seems to not work. This is probably by design (?). I have more than one root file in my current directory and I know for sure there are duplicate (event,run) pairs amongst them.

Example code with what I expect to be printed out:

import ROOT as r
c1 = r.TChain("t")
c1.Add("*.root")
print c1.Draw("1","1") # should be sum of events in root files
index = r.TTreeIndex(c1, "run","event")
print index.GetN() # should be "unique" number of events
c1.SetTreeIndex(index)
print c1.Draw("1","1") # should be "unique" number of events

Yet, all 3 numbers that get printed out are the same.

Can you recommend a way to quickly draw quantities for only the unique events? Hope I’m not being vague.

Thanks,
Nick

Hi Nick,

did you have a look to the TDataFrame ? It is not clear to me what you are actually trying to achieve. Do you want to select in your chain all the events characterised by a certain Run and Event Number ?

Cheers,
D

Hi,

If I can naively compare TDataFrame to the dataframe used by pandas, for example, then I’m sure there’s a clean way to do what I want.

Basically, I have events coming from two datasets (and thus, let’s say, two root files) which can have overlapping events. The unique identifiers are (run#,event#). At the end of the day, I’d like to have some kind of TChain that transparently considers only unique events so that I don’t double count when filling histograms in Draw statements.

TDataFrame is currently unavailable to me because the environment I am restricted to requires v6.02 of ROOT. Of course, I could try to make it work, but I’m just wondering if there’s a way to do it without TDataFrame.

Thanks again,
Nick

Hi Nick,

my recommendation would be to upgrade version: I understand that there are constraints but 6.02 is quite an old release.

Said that, there are many ways to achieve what we are discussing. I think the key point is to implement a way to discard an event if it has been already “used” to fill an histogram (or perform whatever action). If we think about the pair of values run-event, the simplest data structure that comes to rescue is the std::set.
More concretely:

std::set<pair<unsigned int, unsigned int>> analysedEvents;
// here we start the event loop, e.g. with TTreeReader
while (myReader.Next()) {
    auto run = *run_readerValue;
    auto evt = *evt_readerValue;
    // Skip if event already studied
    if (!analysedEvents.insert({run, evt}).second) continue; // insert returns a pair. The second element is true if the element inserted was not in the set, false otherwise
   // Here do the work, e.g. fill histos...
   ...
}

depending on the size of the set, you may want to try out std::unordered_set too just to check if there is a sizable performance benefit.
I hope that helps.

Cheers,
D

Hi Danilo,

Thanks. I guess with my root version, I can’t escape doing an explicit loop.

I was just hoping to avoid explicit loops so that I can keep all my code in a python script with ~compiled speeds via Draw commands :slight_smile:

Nick

Hi Nick,

I am indeed afraid that TTree::Draw cannot help you here :frowning: as something like TDataFrame.
Just remember to deactivate the branches you do not use in your Python loop. That will drastically speedup the program cutting off unnecessary decompression/deserialisation.
Let us know if there are issues.

Cheers,
D

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.