RDataFrame bug with empty input trees?

Hi,

I’m using ROOT version 6.18.00 and using RDataFrames to analyse a large number of input files. I found that when the data frame encounters an input file where the specified tree has zero entries, it does not process the trees in any files later in the list. This failure is silent so it took me a while to notice it. I checked by hadding all input files and found that in this case the correct number of events were processed.

To be specific, if I run on a long list of input files like so:
files_to_use = {[many files with 792620 total events], file with zero events, [many files with 7385541 total events]};
RDataFrame frame(tree_name, files_to_use);
I find that exactly 792620 events were processed by my data frame.

If I hadd all the inputs into a single file and do:
files_to_use = {file_with_8178161_events};
RDataFrame frame(tree_name, files_to_use);
I find that all 8178161 events were processed by the data frame.

This seems like a bug. Can we fix it so that the evaluation can proceed even if there is an empty file in the inputs?

Thanks!
Kate

Hi Kate,
thanks for the report.
I’d like to try and reproduce the problem locally:

  • do the files with zero events contain a TTree with zero events, or no TTree at all?
  • is there no error message or warning displayed on screen?

Cheers,
Enrico

Hi Enrico,

Thanks for taking a look!

  • The files do contain a TTree with the correct name, it just has zero events in it

  • There was this error:
    Error in TTreeReader::SetEntryBase(): There was an error while notifying the proxies.
    But the job completed fine.

Cheers,
Kate

(sorry, I missed this one in the first reply so have added in an edit)

Hi,
I’m afraid I can’t reproduce this.
I’m playing with the code below. I tried with a completely empty file, with a file with a TTree with the right branches but no entries, and with a TTree with no entries and no branches (with ROOT master and v6.18, with and without EnableImplicitMT…). In all cases the program prints 20 as expected.

Does the tentative reproducer below also work for you? If yes, there is something else going on. Can you provide a minimal reproducer of your situation?

Cheers,
Enrico

#include <ROOT/RDataFrame.hxx>
#include <TFile.h>
#include <TTree.h>
#include <iostream>

int main()
{
   // f1.root has 10 entries
   auto ten_entries = ROOT::RDataFrame(10).Define("x", []{return 42.; });
   ten_entries.Snapshot("t", "f1.root");

   // f2.root has an empty TTree
   TFile f("f2.root", "RECREATE");
   TTree t("t", "t");
   int x = 42;
   t.Branch("x", &x);
   t.Write();
   f.Close();

   // read in full file, empty file, full file
   ROOT::RDataFrame df("t", {"f1.root", "f2.root", "f1.root"});
   std::cout << *df.Count() << std::endl; // should print 20

   return 0;
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.