How can I separate out the event ID's in a TChain that have the same value

rushabhgala · April 28, 2025, 6:34pm

Please read tips for efficient and successful posting and posting code

I have a code that chains 1001 root files and applies an analysis cut. Then an energy histogram is drawn and saved as a root file and the equivalent bin and count data is saved in a .txt file that I use for further analysis.

What I want to do: I want to get the event id ‘ievt’ for events after the analysis cut is applied but since the files are chained, each file has the same range of ievt and there isn’t a way to get the unique id for the event that survived. Generally run numbers are used for separating different files but in this case, I don’t have that. Is there a way I can get remove the degeneracy of the ievt?

I don’t know what else I should provide since my root file has:

ievt: event_id
energy: total energy from the event
no_of_events: number of events
edep_in_int: edep in a sub volume1
edep_in_ext: edep in a sub volume2
det_id: id of the detector that had non-zero edep

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

StephanH · April 29, 2025, 8:07am

Hello @rushabhgala,

if all your ROOT file contains are the below branches, the only unique identifier will be a combination of the dataset name (e.g. the file name) + the event number inside the file.

Alternatively, albeit less reproducible, is a “global” event number that you can compute by counting all events, but this requires that you always read the files in the same order to be unique. I would try to go with option one, and assign each file a dataset ID similar to a run number.

rushabhgala · April 29, 2025, 3:36pm

There are a few more branches in my file, but they’re all similar to

edep_in_X: edep in a sub volumeX

so that won’t be of much help to me.
How can I assign each file a dataset ID? Should I write another branch for each file? or is there another way?

I tried using the ‘global event number’ but as you said it is not reproducible for me as I can’t guarantee reading the files in the same order every time.

StephanH · April 29, 2025, 4:50pm

Hello,

there are a few options:

Are you using RDataFrame or a plain TChain to analyse the files? In RDataFrame, you can ask for file name and entry number. This is called DefinePerSample.
In a TChain, you can list the files first, get them in a stable order, and create the chain. Now, by just counting every event before the cuts, you should get a “stable” global event number.
If you don’t have a stable order, you can ask a TChain for the current file, see e.g. in this post. This can be used to derive a unique ID from the filename and entry number.

There might be more ways, but let’s see if one of the above can work for you.