I’m using ROOT version 6.18.00 and using RDataFrames to analyse a large number of input files. I found that when the data frame encounters an input file where the specified tree has zero entries, it does not process the trees in any files later in the list. This failure is silent so it took me a while to notice it. I checked by hadding all input files and found that in this case the correct number of events were processed.
To be specific, if I run on a long list of input files like so:
files_to_use = {[many files with 792620 total events], file with zero events, [many files with 7385541 total events]};
RDataFrame frame(tree_name, files_to_use);
I find that exactly 792620 events were processed by my data frame.
If I hadd all the inputs into a single file and do:
files_to_use = {file_with_8178161_events};
RDataFrame frame(tree_name, files_to_use);
I find that all 8178161 events were processed by the data frame.
This seems like a bug. Can we fix it so that the evaluation can proceed even if there is an empty file in the inputs?
Hi,
I’m afraid I can’t reproduce this.
I’m playing with the code below. I tried with a completely empty file, with a file with a TTree with the right branches but no entries, and with a TTree with no entries and no branches (with ROOT master and v6.18, with and without EnableImplicitMT…). In all cases the program prints 20 as expected.
Does the tentative reproducer below also work for you? If yes, there is something else going on. Can you provide a minimal reproducer of your situation?
Cheers,
Enrico
#include <ROOT/RDataFrame.hxx>
#include <TFile.h>
#include <TTree.h>
#include <iostream>
int main()
{
// f1.root has 10 entries
auto ten_entries = ROOT::RDataFrame(10).Define("x", []{return 42.; });
ten_entries.Snapshot("t", "f1.root");
// f2.root has an empty TTree
TFile f("f2.root", "RECREATE");
TTree t("t", "t");
int x = 42;
t.Branch("x", &x);
t.Write();
f.Close();
// read in full file, empty file, full file
ROOT::RDataFrame df("t", {"f1.root", "f2.root", "f1.root"});
std::cout << *df.Count() << std::endl; // should print 20
return 0;
}