Event mixing with RDataFrame

Hi all,

I am currently discovering the RDataFrame tool, and I was wondering if mixing events are possible. I couldn’t see anything close to it in the different tutorials.
Let´s say I have a Tree with a Jpsi and its muons for each entry, how can I get something like the difference of the pseudorapidity of the j/psi in entry 1 and the pseudorapidity of the j/psi in entry 2 (or any other uncorrelated entry).
Maybe it is better to use plain ROOT for this kind of thing but if anybody knows something let me know !

1 Like

I don’t have any idea, maybe @eguiraud can help you.

1 Like

This might be of some relevance (“TimeFrame”):
Using ROOT in high-frequency finance

Here, correlations between events in time (naturally in separate ‘entries’) need to be analyzed together.

1 Like

Hi @nmangane, I think I understood why it might be of interest but it is not clear now how to use it. Did you ever try this ? Is it like an extension of RDataFrame or a totally new independent class ?

Hi @Samuel1 ,
sorry for the high latency, I was off last week.
The most tricky part about processing entries with a sliding window is multi-threading – each entry will process a bunch of entries at a time, and the first and last entries in the bunch will not have a previous/next entry so you’ll miss some statistics.

If you are happy with single-thread processing, or you don’t mind losing some of the pairs of consecutive entries, you can use a stateful functor + RDataFrame, something like (haven’t tested, it’s just to give you an idea):

// not thread-safe, but can easily be made thread-safe by using a vector of
// previousPseudoRapidity values, one per thread (df.GetNSlots() returns the
// the required number of threads/processing slots).
struct PseudoRapidityDiff {
  double previousPseudoRapidity = 1e20;

  double operator()(double pseudoRapidity) {
     if (previousPseudoRapidity > 1e19) // no previous value
        return -999;
     double diff = pseudoRapidity - previousPseudoRapidity;
     previousPseudoRapidity = pseudoRapidity;
     return diff;

// only safe in single-thread processing
df.Define("pseudoRapDiff", PseudoRapidityDiff{}, {"pseudoRapidity"});

I hope this helps!

I am not familiar with TimeFrame personally, but my understanding is that it is an independent class that reuses some RDF concepts but natively works with sliding windows. @Axel will know more.

Hi @eguiraud,

Thanks a lot for taking the time to reply. Ok I understand, my point was to use the multi threading as I am processing a lots of data. I am actually pairing the pseudorapidity of my jpsi with all the hadrons of the next event (ideally a selected independent event) so it is quite long! I’ll try anyway thanks !

I completely understand, and then you need to decide what to do with those boundaries at which RDataFrame splits the dataset for multi-thread processing, I don’t think it’s clear in general.

Is it possible to get the entry number of the last entry of one bunch of data ? So I can add something like “if last take first”. That would fix it I think.

Yes but only in ROOT master and the upcoming release v6.26: you need DefinePerSample. The expression you pass to DefinePerSample takes a RSampleInfo object as input that can tell you the entry range that is going to be processed.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.