Problem using RDataFrame for trees with 1 event

Hi,

I’ve run into a strange problem using RDataFrame. I’m attaching two trees. One has one event. One has two events. The format of the trees is the same for both. The following code works for the two-event case just fine, but hangs for the one-event case. This happens for all of my trees when there’s only one event. There’s nothing complicated about the trees, just doubles, but there are ~200 branches. The TTree::Draw(branch) method works just fine on both files. Any idea why the RDataFrame method would work for two or more events (typically hundreds of thousands), but not for one?

I’m using ROOT Version: 6.20/00.

Thanks!

TH1F* dataframeMethod(TString fileName){
  ROOT::EnableImplicitMT(0);
  ROOT::RDataFrame DF("tree",fileName.Data());
  auto hDF = DF.Filter("(1==1)")
               .Define("VAR","BeamEnergy")
               .Histo1D({"NAME","TITLE",100,0.0,5.0},"VAR");
  TH1F hist; hDF->Copy(hist);
  return new TH1F(hist);
}

test1Event.root (17.4 KB) test2Events.root (19.2 KB)

Hi again,

Below is simple stand-alone code that shows the same behavior, now with only one branch.

This works:

makeTree(2); skimTree(); dataframeMethod("treeCopy.root")->Draw();

This hangs:

makeTree(1); skimTree(); dataframeMethod("treeCopy.root")->Draw();

Interestingly, this works:

makeTree(1); dataframeMethod("tree.root")->Draw();

So my problem is somehow with copied trees? That’s the method I currently use for skimming (with 1==1 replaced by real cuts, etc.).

Stand-alone code:

void makeTree(int numEntries){
  TTree *tree = new TTree("tree","tree");
  double x = 2.0;
  tree->Branch("BeamEnergy",&x);
  for (int i = 0; i < numEntries; i++){
    tree->Fill();
  }
  TFile file("tree.root","recreate"); file.cd();
  tree->Write();
}

void skimTree(){
  TChain* tree = new TChain("tree");
  tree->Add("tree.root");
  TTree* tree2 = tree->CopyTree("(1==1)");
  TFile file2("treeCopy.root","recreate");  file2.cd();
  tree2->Write();
}

TH1F* dataframeMethod(TString fileName){
  ROOT::EnableImplicitMT(0);
  ROOT::RDataFrame DF("tree",fileName.Data());
  auto hDF = DF.Filter("(1==1)")
               .Define("VAR","BeamEnergy")
               .Histo1D({"NAME","TITLE",100,0.0,5.0},"VAR");
  TH1F hist; hDF->Copy(hist);
  return new TH1F(hist);
}

Hi @remitche,
thank you for the standalone reproducer, that’s very helpful.
I will take a look at soon as possible and get back to you here.

Cheers,
Enrico

Hi,
I can reproduce the problem locally, it seems the issue is that treeCopy.root is not…ok.

This version of skimTree() seems to fix the problem:

void skimTree(){
  TChain* tree = new TChain("tree");
  tree->Add("tree.root");
  TFile file2("treeCopy.root","recreate");  file2.cd();
  TTree* tree2 = tree->CopyTree("(1==1)");
  tree2->Write();
}

Can you confirm that’s the case for you?

Here’s a compilable version of your reproducer that works for me: copied_tree_1evt.cpp (1.1 KB)

Cheers,
Enrico

EDIT: I don’t know why exactly copying the TTree before creating its destination TFile does not work as expected, but since TTree can auto-flush its contents to file as you fill it, in general it’s good practice to set the destination TFile beforehand.

Hi Enrico,

Thanks – opening the file before doing the TTree::CopyTree works, as in your modified skimTree().

Actually, in my original skimming code, I also opened the file before copying, but I was using TTree::AutoSave (for some reason I forget) instead of TTree::Write.

So RDataFrame apparently doesn’t work for trees with one entry produced using this (but does work if there is more than one entry):

void skimTree(){
  TChain* tree = new TChain("tree");
  tree->Add("tree.root");
  TFile file2("treeCopy.root","recreate");  file2.cd();
  TTree* tree2 = tree->CopyTree("(1==1)");
  tree2->AutoSave();
}

But if I include a TFile::Write (which I guess I should have included anyway) all is okay:

void skimTree(){
  TChain* tree = new TChain("tree");
  tree->Add("tree.root");
  TFile file2("treeCopy.root","recreate");  file2.cd();
  TTree* tree2 = tree->CopyTree("(1==1)");
  tree2->AutoSave();
  file2.Write();
}

Does this last skimTree function look okay to you?

Thanks a lot for your help. The new RDataFrame functionality is really helpful…

Best,
Ryan

1 Like

Uhm, I think @pcanal might be one of the very few people that can explain the difference between these my version of skimTree and these two of yours.

In any case, it’s always safe to construct the file before the TTree and to call Write (on either the TTree or the TFile) at the end.

Cheers,
Enrico

OK, thanks, I guess my original problem was just the missing Write() (which anyway worked most of the time, maybe accidentally). All seems good now.

Creating the file after filling the TTree is almost always a mistake as it often lead to the data being either stored in the wrong file or being held all in memory at once (for example if the amount of that that was selected was 20 GB, you would need that much free memory to execute the skim).

@eguiraud Even-though the file is unusual (and may or may not be in form that is not intended and thus might need a fix in ROOT I/O and/or TTree) is indicative of an edge case calculation error around TTreeProcessorMT.cxx:200.

Another way to solve the problem is likely:

  TTree* tree2 = tree->CopyTree("(1==1)");
  TFile file2("treeCopy.root","recreate");  file2.cd();
  tree2->SetDirectory(&file2);
  tree2->Write();

I find that in practice I use a lot of different models in my workflow. They all have slightly different properties. Ensembled trees for finding interactions and transformations, which go through shrinkage and selection and finally to a simpler model for inference. knn and model-based recursive partitioning for imputation, etc, etc. These debates about best classifiers are absolutely useless.

This was probably meant for a different thread…?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.