TDataFrame's Declare ignores the result of a Filter

Hi,

I started playing around with the TDataFrame classes today. I am seeing a weird error. I would assume that if I define a new variable using Define for events passing a Filter, then it would only be calculated for events passing the Filter. However I am seeing it calculated for all events. Is this the expected behavior? To me it seems like a waste of CPU.

I’ve attached an example reproducing it. I make a TTree with two branches; na and a. The a is a list of floats while na is the number of elements in a. I then Filter to get only entries with na!=0. Then I Define a variable called lasta by saying “a[na-1]”. This crashes. By adding printf() statement, I see that it is trying to calculate the na==0 (aka a[-1]) entry.

rootdftest.py (940 Bytes)

Also I noticed that Define runs the calculation right away, not only when the value is requested. Is this also expected?

Edit: I forgot to mention that I am using the latest pro root version 6.10/08.

Hi kkrizka,
thank you for trying out TDataFrame. You are right, that is not the expected behavior.
I can reproduce your issue in v6.10 and I can see that has been resolved in v6.11 and master (thanks for providing a drop-in reproducer by the way!): the Define is not executed, as no event loop is triggered in your script.
Also in these newer versions adding print result_def.Count().GetValue() at the end of the script prints 9 as expected (and never triggers the Define, as its value is never requested).

I will check whether this is something that we can fix in a subsequent v6.10 patch release.
In the meanwhile, if you want to play with TDataFrame I suggest you download v6.11 or soon v6.12 as we added quite a lot of new features and fixed several issues.
Cheers,
Enrico

Hi Enrico,

Thank you for the prompt reply! I can also confirm that this works in v6.11.


Karol Krizka

I have one more question related to TDataFrame. What is the best forum to request features? (ie: histograms with per-event weights)

Good. Is switching to a more recent ROOT version a viable solution for you?

This is the right place. Per-event weights are already supported though, you just need to pass the weights as a column, e.g. Histo1D(“x”,“w”).

Hi,

the right syntax is the one proposed by Enrico. Here you can find the documentation of the method.

Cheers,
D

Hi,

Thank you for the hints. I’ve also managed to find it a bit after I posted this message.

One more thing that I can’t seem to find in documentation is a “concatenate” or “append” function. Something that will allow me to append two or more data frames together. The use case is that I have individual ntuples for the different pT slices of a dijet Monte Carlo sample. I load then into individual TDataFrame and apply a Define to give them a weight (cross-section/nEvents). After that, the different pT slices can be treated as a single sample. So I would like to merge them and apply my selection to the combined data frame. Is that something that is possible with TDataFrames right now?


Karol Krizka

PS: Switching to a more recent ROOT version is a viable solution for me.

Hi,
no there is no way to append two TDataFrames, at least currently.

I can think of two possible workarounds (both come with their pros and cons):

If you know the number of entries of each Monte Carlo sample, you can open all of them in the same TDataFrame and Define cross-section/nEvents also as a function of the entry number.

Alternatively, since you need to do this procedure only once per Monte Carlo sample, you could Snapshot each (filtered?) pt slice to a different file and then open all pt slices together in a third TDataFrame.

I am aware both suggestions are far from optimal, but I can’t think of anything smarter :sweat_smile:

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.