DefinePerSample interaction with other RDF functions

konrad · January 12, 2023, 2:29pm

ROOT Version: 6.26.06

Hello!

I have again a problem with RDataFrame::DefinePerSample, where it behaves like the ordinary Define and only uses a single value for all samples.

Here is a simple example to reproduce my problem: test2.C (2.0 KB)
In the example I generate two trees, one only containting 0.5s, the other only 2s. I create a RDataFrame from the files, and want to calibrate the data so that the entire column is filled with ones.

I use DefinePerSample to define weights depending on the input file:

df2 = df.DefinePerSample(
    "weightbysample", [&fileNames, &weights](unsigned int, const ROOT::RDF::RSampleInfo &id){
		for (unsigned int i=0; i<fileNames.size(); i++)
			if (id.Contains(fileNames[i]))
				return weights[i];
		return -1.;
		});

std::cout << "Sum of weights: " << df2.Sum<double>("weightbysample").GetValue() << std::endl;

The last line produces the correct checksum, showing that the column contains different weights. However, it does this only once. If I repeat the line right after, it returns the wrong checksum (all entries are the last value from my weights vector).

If I define a new column to calibrate my data:

df2 = df2.Define("calibrated","column1 * weightbysample");

and I produce a histogram with Histo1D, it shows me that the same weight was applied to all my column. Similarly, if I plot a histogram of the column containing the weights, all of them have the same value. However, if I plot the weights right after DefinePerSample, I get the correct histogram.

I tried to snapshot the dataframe after using DefineBySample and to reload, but it also didn’t work around this problem.

Anybody knows what is going wrong here, and how it can be fixed?

Thanks in advance!

bellenot · January 12, 2023, 3:11pm

Maybe @vpadulan or @axel can take a look

vpadulan · January 17, 2023, 3:19pm

Hi,
I have created a slightly simpler reproducer and opened an issue on github, I’ll follow this in the next days.

The current best workaround is to book all the operations you need before triggering any. So, given your attached example, I have created this modified version that should work

workaround.cpp (2.3 KB)

Cheers,
Vincenzo

konrad · January 26, 2023, 2:01pm

Thank you very much!
I will test your workaround in the next days to see if I can make it work in my code.

Cheers,
Konrad

P.S.: Yes, it solves the issue!

system · February 9, 2023, 2:02pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.