Sorting to find coincident events

RasmusWallentin · April 29, 2024, 3:09pm

I have a large TTree with three branches:
“Timestamp”: time from start of measurement in picoseconds.
“Channel”: The TTree contains data from 8 different detectors and the channel specifies which detector the specific event is from, it is a number between 0 and 7.
“Energy”: a measurement of energy

I need a way to sort my tree (or create a new one) so that it only contains coincident events between different detectors. A coincident event is in this case an event where the time difference abs(Timestamp[1] - Timestamp[2]) <= 400. I want to store the energy of both events and which detector pair it belongs to, so I later can create 2D histograms with energy in detector x vs energy in detector y, containing only data from coincident events.

I do not know how I should move forward in sorting my data and I appreaciate any help possible.

I am new to ROOT and do not know any c++, I would appreciate an answer with PyRoot.

Danilo · April 29, 2024, 6:22pm

Dear Rasmus,

Thanks for the post and welcome to the ROOT Community!

If I understand correctly, you have a dataset with 3 columns, the first being Timestamp, which is of an integer type.
You would like to sort your dataset according to the first column.
Is that correct?

Best,
D

RasmusWallentin · May 2, 2024, 6:47am

Dear Danilo

Thank you for asking. Yes that is correct, being able to sort the tree based on the timestamp value would be very helpful.

Best,
Rasmus

vpadulan · May 4, 2024, 10:33am

Dear @RasmusWallentin ,

Thanks for reaching out to the forum. I will try to guide you further with your case.

My understanding is that this is indeed not a case of sorting, so I will ask you a few more questions.

abs(Timestamp[1] - Timestamp[2]) <= 400

What is [1] and [2] here? Does the Timestamp column actually contain arrays per each event? Or do you mean that you want to take for every event, the difference between the timestamp of that event with the timestamp of the previous event? And would this lag/shift only include a window of two events or would you need to consider potentially different far events?

I want to store the energy of both events and which detector pair it belongs to

This means that you are changing the dataset schema with your output tree, because the meaning of a row of the dataset changes from one event to two events at the same time.

In your post I do not see any mention of where or how you would sort the data, once the difference between timestamps is computed you don’t care about which values are smaller/larger than which other values, or do you?

If you indeed want to store the differences between timestamps of a window of 2 events at a time, then I suggest you take a look at this simple example from the forum : Event mixing with RDataFrame - #5 by eguiraud

Or let us know if you need something else entirely. Also, it would help to know your dataset schema to better understand the use case.

Cheers,
Vincenzo

ferhue · May 6, 2024, 8:41am

Something along these lines could do the job:

  TFile* f = TFile::Open(...);
  TTree* t = file->Get("...); // This is the tree with the single events that you load from your TFile
  t->SetBranchAddress(...); // to get TimeStamp Energy Channel
  TFile* output = new TFile() ; // where to store found coincidences
  TTree* c = new TTree(); // This is the output tree with the coincidences
  c->Branch(...); // To fill Energies, Channels, Time ...
  t->BuildIndex("Timestamp");  // Sort ttree single events by time
  TTreeIndex *I=(TTreeIndex*)t->GetTreeIndex();
  Long64_t* index=I->GetIndex();
  Long64_t indo,indx=-1;
  Long64_t lastTimeStamp = -1;
  Double_t lastEnergy = 0;
  Short_t lastChannel = -1;
  for(Long64_t i=0; i<n; i++){
    indx=index[i];
    t->GetEntry(indx);
    if(lastTimeStamp != -1){
      if(timeStamp - lastTimeStamp < TimeCoincidenceWindow) {
          ChannelA = lastChannel;
          ChannelB = Channel;
          EnergyA = lastEnergy;
          EnergyB = Energy;
          Time = (lastTimeStamp + TimeStamp)/2;
          c->Fill();
      }
    }
    lastEnergy = Energy;
    lastChannel = Channel;
    lastTimeStamp = timeStamp;
  }

Just be aware of this longstanding rounding bug in the BuildIndex function: https://its.cern.ch/jira/browse/ROOT-8276

RasmusWallentin · May 6, 2024, 4:20pm

Thanks for the responses!

Regarding the structure of the data. Previously for some other data analysis we have read .root files and created a TTree and then converted it to a RDataFrame. However, I thought that when sorting data it is probably more efficient to use the Tree class and thus I wanted to stay there. Anyhow to present more insight into the datastructure I present the following code snippet and the output.

tree = create_Tree()  #Self definied function that returns a tree created from the data
print(f'type of tree: {type(tree)}')
print(f'tree: {tree}')

df = ROOT.RDataFrame(tree) 

print(f'type of dataframe: {type(df)}')
print(f' df: {df}')
print(df.GetColumnNames())
print(df.GetColumnTypeNamesList())

generates the following output

tree: Name: Data_R Title:
type of dataframe: <class cppyy.gbl.ROOT.RDataFrame at 0x82ed998>
 df: A data frame built on top of the Data_R dataset.
{ "Channel", "Timestamp", "Board", "Energy", "Flags" }
Traceback (most recent call last):
  File "3läsadata.py", line 67, in <module>
    print(df.GetColumnTypeNamesList())
AttributeError: 'RDataFrame' object has no attribute 'GetColumnTypeNamesList'

The “Board” and “Flags” data are not of any importance at this stage.

To clarify what each column contains here is an example of how the data would have been presented in a spreadsheet, without loss of information:

event timestamp energy channel
1 23453 150 3
2 32423 3400 4
.
.
.

Data can be obtained using the following snippet of code:

for event in myTree: 
      ch    = event.Channel
      time  = event.Timestamp
      energy= event.Energy

and ch, time and energy are all floats in this case (or ints, but you get the point, numbers not lists)

If I would have had acces to unlimited computing power I would have written the following code:

list_of_coincident_events=[]
for event1 in tree: 
      ch1    = event1.Channel
      time1  = event1.Timestamp
      energy1 = event1.Energy
      for event 2 in myTree:
            time2  = event2.Timestamp
            if abs(time1 - time2) <=400 :
                        ch2    = event2.Channel      
                        time2  = event2.Timestamp
                        energy2 = event2.Energy
                        list_of_coincident_events.append([ch1, ch2, time1, time2, energy1, 
                        energy2])

#Something that saves list_of_coincident_events in a reasonable format

I currently think that a reasonable way of performing the task is to sort the tree based on timestamp and then loop through the data like above, but the second for loop wouldn’t be necessary (since the data is sorted and only the next event would have to be considered), wich would reduce the computation time drastically.

edit: added code tags

RasmusWallentin · May 7, 2024, 8:49am

I now realised that the raw data is actually sorted in time, this was not expected. I am sorry for wasting your time on this problem.