Home | News | Documentation | Download

Snapshot method of RDataFrame not copying data identically

I have a large collection of ROOT files, containing identically formatted TTrees. I wish to merge these into a single file, containing only three branches (call them x,y,z) from the original trees, in addition to a new branch created by performing some mathematical operations on branches of the original trees (some of which will not be included in the merged file).

I found that the “cleanest” and, by far, the fastest way was to use the .Snapshot() and .Define() methods of RDataFrame:

void merge(const char* in_path, const char* out_path){

    ifstream stream(in_path);
    
    vector<string> list;
    string line;

    while(getline(stream, line))
        list.emplace_back(line);

    ROOT::RDataFrame df("tree",list);

    df.Define("newBranch", [&](UInt_t a, UInt_t b, int x){
        return myMathematicalFunction(a,b,c);
    },{"a","b","x"}).Snapshot("tree",out_path,{"newBranch","x","y","z"});

}

…where in_path gives the path to a text file containing the list of ROOT file paths.


The branch y contains sequential data of type UInt_t (e.g. 1, 7, 13, 24). In the merged file, the first couple thousand or so entries match what is in the original files. Unfortunately, however, at some seemingly arbitrary point, the remaining entries jump to some constant large value.

This doesn’t seem to be some overflow or similar issue with the size of the original values. For example, in one case, the data in y were simply the integers, starting at 1 (i.e. 1,2,3,4,5...), and after 2707 became 4590021 for the remaining entries.

I’ve verified that the issue does not exist in the original files.

What’s going on here?

I think @eguiraud can help you.

Hi,
are you running a multi-thread event loop (ROOT::EnableImplicitMT())? In that case the output entries will be shuffled in blocks depending on how threads process them.

If not, and you see no warnings or errors at the command line, looks like a bug, and it would be great if you could share a reproducer so we can take a look on what’s going on.

Cheers,
Enrico

Thanks for looking at this. I’m not using ROOT::EnableImplicitMT(), and I’m not seeing any warnings or errors.

I’ll try to put together and post a simple reproducer soon.