Snapshot method of RDataFrame not copying data identically

KAM · November 25, 2021, 3:59am

I have a large collection of ROOT files, containing identically formatted TTrees. I wish to merge these into a single file, containing only three branches (call them x,y,z) from the original trees, in addition to a new branch created by performing some mathematical operations on branches of the original trees (some of which will not be included in the merged file).

I found that the “cleanest” and, by far, the fastest way was to use the .Snapshot() and .Define() methods of RDataFrame:

void merge(const char* in_path, const char* out_path){

    ifstream stream(in_path);
    
    vector<string> list;
    string line;

    while(getline(stream, line))
        list.emplace_back(line);

    ROOT::RDataFrame df("tree",list);

    df.Define("newBranch", [&](UInt_t a, UInt_t b, int x){
        return myMathematicalFunction(a,b,c);
    },{"a","b","x"}).Snapshot("tree",out_path,{"newBranch","x","y","z"});

}

…where in_path gives the path to a text file containing the list of ROOT file paths.

The branch y contains sequential data of type UInt_t (e.g. 1, 7, 13, 24). In the merged file, the first couple thousand or so entries match what is in the original files. Unfortunately, however, at some seemingly arbitrary point, the remaining entries jump to some constant large value.

This doesn’t seem to be some overflow or similar issue with the size of the original values. For example, in one case, the data in y were simply the integers, starting at 1 (i.e. 1,2,3,4,5...), and after 2707 became 4590021 for the remaining entries.

I’ve verified that the issue does not exist in the original files.

What’s going on here?

couet · November 25, 2021, 7:39am

I think @eguiraud can help you.

eguiraud · November 25, 2021, 8:43am

Hi,
are you running a multi-thread event loop (ROOT::EnableImplicitMT())? In that case the output entries will be shuffled in blocks depending on how threads process them.

If not, and you see no warnings or errors at the command line, looks like a bug, and it would be great if you could share a reproducer so we can take a look on what’s going on.

Cheers,
Enrico

KAM · December 3, 2021, 9:27pm

Thanks for looking at this. I’m not using ROOT::EnableImplicitMT(), and I’m not seeing any warnings or errors.

I’ll try to put together and post a simple reproducer soon.

system · December 17, 2021, 9:27pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.