I started playing around with the TDataFrame classes today. I am seeing a weird error. I would assume that if I define a new variable using Define for events passing a Filter, then it would only be calculated for events passing the Filter. However I am seeing it calculated for all events. Is this the expected behavior? To me it seems like a waste of CPU.
I’ve attached an example reproducing it. I make a TTree with two branches; na and a. The a is a list of floats while na is the number of elements in a. I then Filter to get only entries with na!=0. Then I Define a variable called lasta by saying “a[na-1]”. This crashes. By adding printf() statement, I see that it is trying to calculate the na==0 (aka a[-1]) entry.
Hi kkrizka,
thank you for trying out TDataFrame. You are right, that is not the expected behavior.
I can reproduce your issue in v6.10 and I can see that has been resolved in v6.11 and master (thanks for providing a drop-in reproducer by the way!): the Define is not executed, as no event loop is triggered in your script.
Also in these newer versions adding print result_def.Count().GetValue() at the end of the script prints 9 as expected (and never triggers the Define, as its value is never requested).
I will check whether this is something that we can fix in a subsequent v6.10 patch release.
In the meanwhile, if you want to play with TDataFrame I suggest you download v6.11 or soon v6.12 as we added quite a lot of new features and fixed several issues.
Cheers,
Enrico
Thank you for the hints. I’ve also managed to find it a bit after I posted this message.
One more thing that I can’t seem to find in documentation is a “concatenate” or “append” function. Something that will allow me to append two or more data frames together. The use case is that I have individual ntuples for the different pT slices of a dijet Monte Carlo sample. I load then into individual TDataFrame and apply a Define to give them a weight (cross-section/nEvents). After that, the different pT slices can be treated as a single sample. So I would like to merge them and apply my selection to the combined data frame. Is that something that is possible with TDataFrames right now?
–
Karol Krizka
PS: Switching to a more recent ROOT version is a viable solution for me.
Hi,
no there is no way to append two TDataFrames, at least currently.
I can think of two possible workarounds (both come with their pros and cons):
If you know the number of entries of each Monte Carlo sample, you can open all of them in the same TDataFrame and Define cross-section/nEvents also as a function of the entry number.
Alternatively, since you need to do this procedure only once per Monte Carlo sample, you could Snapshot each (filtered?) pt slice to a different file and then open all pt slices together in a third TDataFrame.
I am aware both suggestions are far from optimal, but I can’t think of anything smarter