Snapshot a friend tree with RDataFrame and ImplicitMT enabled

Hi,
I have a TTree with two branches and I want to add a friend tree in the same root file with new branches.
I am using RDataFrames to define the new columns in the friend tree but I can’t figure out how to correctly snapshot the friend tree in the same file as the main TTree.

This is the best I can achieve so far:

//some code here
...
//
typedef std::tuple<ROOT::VecOps::RVec<double>,ROOT::VecOps::RVec<double>,ROOT::VecOps::RVec<double>,ROOT::VecOps::RVec<double>> TYPE_TUPLE_WITH_4_RVEC;

void add_friend(TString fileName, TString treeName, TString friendTreeName){
   TFile *f = new TFile(fileName, "UPDATE");
   TTree *myTree = (TTree*)f->Get(treeName);
   myTree->AddFriend(new TTree(friendTreeName,friendTreeName), fileName);
   f->Write();
   f->Close();
}


int main(int argc, char** argv)
{
	//some code here
    ...
    //
 
	//set I/O file names
	string 	outDir  = argv[1],
		    rootIn  = argv[2],
		    rootOut = rootIn; //update same file
	
	//id like to use MT
	ROOT::EnableImplicitMT();
	
 	add_friend(rootIn, "amulet", "decoded_signals");	
	ROOT::RDataFrame df_trig("amulet", rootIn.c_str());

	//tell snapshot to update root file
	ROOT::RDF::RSnapshotOptions opt;
	opt.fMode="UPDATE";
	opt.fOverwriteIfExists=true;

	//define new columns with the decoded informations and update root files with the new columns (branches)
	df_trig .Define("ch0_wvf_time"	, [sampfreq]( ROOT::VecOps::RVec<int> ADC_ch ){ return temporalize(ADC_ch, sampfreq); }, {"ch0_wvf_amp"} )
		.Define("ch1_wvf_time"	, [sampfreq]( ROOT::VecOps::RVec<int> ADC_ch ){ return temporalize(ADC_ch, sampfreq); }, {"ch1_wvf_amp"} )
		.Define("ch0Decoded"    , [sampfreq]( ROOT::VecOps::RVec<int> ADC_ch ){ return analyze(ADC_ch, sampfreq);     }, {"ch0_wvf_amp"} )
		.Define("ch1Decoded"    , [sampfreq]( ROOT::VecOps::RVec<int> ADC_ch ){ return analyze(ADC_ch, sampfreq);     }, {"ch1_wvf_amp"} )
		.Define("ch0Nup"        , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return (int)get<0>(WVF_dec).size();  }, {"ch0Decoded" } )
		.Define("ch0Ndwn"       , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return (int)get<1>(WVF_dec).size();  }, {"ch0Decoded" } ) 
		.Define("ch0timeups"    , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return get<0>(WVF_dec);              }, {"ch0Decoded" } )
		.Define("ch0timedwns"   , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return get<1>(WVF_dec);              }, {"ch0Decoded" } )
		.Define("ch1Nup"        , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return (int)get<0>(WVF_dec).size();  }, {"ch1Decoded" } )
		.Define("ch1Ndwn"       , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return (int)get<1>(WVF_dec).size();  }, {"ch1Decoded" } ) 
		.Define("ch1timeups"    , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return get<0>(WVF_dec);              }, {"ch1Decoded" } )
		.Define("ch1timedwns"   , []( TYPE_TUPLE_WITH_4_RVEC     WVF_dec ){ return get<1>(WVF_dec);              }, {"ch1Decoded" } )
		.Snapshot("decoded_signals", rootOut.c_str(), {"ch0_wvf_time","ch0Nup","ch0Ndwn","ch0timeups","ch0timedwns",								 
                                                       "ch1_wvf_time","ch1Nup","ch1Ndwn","ch1timeups","ch1timedwns"}, opt );     
	return 0;
}

If I try to draw a scatter plot with
amulet->Draw("ch0_wvf_amp:ch0_wvf_time")
where the two variables are one from the main tree (amulet) and the other from the friend tree I get the plot that I expect.

The problem is that while the script is being executed I get this error:

Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root
Error in <TChain::LoadTree>: Cannot find tree with name decoded in file ../DAQERM/preprocessed/run2meas1_amulet.root

If I do not use ImplicitMT the error disappears, anyway I am not sure that I am handling the friend tree as I should otherwise it should work even with MT enabled.

I also tried using another approach:

 	TFile* myFile = new TFile(rootIn.c_str(), "UPDATE");
	TTree* myTree = (TTree*)myFile->Get("amulet");
	TTree* myFT = new TTree("decoded","decoded");
	myFT->CopyEntries(myTree);
	myTree->AddFriend(myFT,"decoded");
	myFT->BuildIndex("Entry$");
	myFT->Write();
	myTree->Write("",TObject::kOverwrite);
	myFile->Close();

	ROOT::RDataFrame df_trig("amulet", rootIn);

	//tell snapshot to update root file
	ROOT::RDF::RSnapshotOptions opt;
	opt.fMode="UPDATE";
	opt.fOverwriteIfExists=true;

    //and here I define the columns and snapshot the df

but I get the exact same error.

Any hint or advise that you might have will be useful, thanks in advance.


ROOT Version: 6.24/00
Platform: Manjaro
Compiler: GCC


Hi Massimo,
two things to note:

  • RDataFrame will not update an existing TTree, it will overwrite it. why do you need the add_friend call to “pre-create” the friend tree?
  • a multi-thread Snapshot will write the entries in shuffled order with respect to the input tree, making it impossible to use the TTree produced this way as a friend of the original input TTree (and you will get an error if you try)

Does this help? I’m not sure what might be causing the errors you see, but I’d like to understand your usecase before diving deeper.

Cheers,
Enrico

Hi,

  1. Ok thanks for pointing out, so what I was doing did not make very much sense. My goal is to evaluate data starting from an existing tree in a root file and adding new informations in a “frien-tree” fashion in the same root file.
  2. Yes, that is why in the second example I posted I’ve created the friend tree beforehand by copying all entries, that was just an (ugly) trick to build the Index so to solve the “reshuffle” problem. My first attempt was in fact to first save into the root file the new tree with its own data and its own name and then add it as a friend to the main tree already present in the root file. From what I understand this approach is correct as long as I do not use MT otherwise I get the “kEntriesReshuffled bit has bin set” error that I guess you were referring to.

Do you have any suggestions on how to proceed? maybe the easiest thing is just to give up using MT and going with point 2. on the not reshuffled entries?

Thanks for your patience!

Alright, thank you for the explanation!
If you have a variable that you can use as index it doesn’t matter if entries in the output tree are shuffled, you can then use BuildIndex and things will work:

#include <ROOT/RDataFrame.hxx>
#include <TTree.h>
#include <TFile.h>
#include <iostream>

int main() {
   // write the main tree
   ROOT::RDF::RSnapshotOptions opt;
   // very inefficient, it causes RDF to spawn a task for each entry read when
   // reading back this file (to exercise shuffling etc.)
   opt.fAutoFlush = 1;
   auto df_t1 = ROOT::RDataFrame(32).Define("idx", "rdfentry_").Snapshot("t1", "f.root", ".*",
   std::cout << "Main tree:\n";
   df_t1->Display(".*", 100)->Print();

   // write the friend with the new branch and the index, with MT on
   ROOT::EnableImplicitMT();
   opt.fAutoFlush = 0; // reset to default value
   opt.fMode="update";
   ROOT::RDataFrame("t1", "f.root")
       .Define("x", "idx*idx")
       .Snapshot("t2", "f.root", {"x", "idx"}, opt);

   // print contents of t2, verify entries are shuffled
   ROOT::DisableImplicitMT(); // need to turn MT off again in order to use Display
   std::cout << "\n\nFriend tree:\n";
   ROOT::RDataFrame("t2", "f.root").Display(".*", 100)->Print();

   // use BuildIndex to read the original tree and its (shuffled but indexed) friend
   TFile f("f.root");
   auto *tmain = f.Get<TTree>("t1");
   auto *tfriend = f.Get<TTree>("t2");
   tfriend->BuildIndex("idx");
   tmain->AddFriend(tfriend);

   std::cout << "\n\nTogether:\n";
   ROOT::RDataFrame(*tmain).Display(".*", 100)->Print(); // verify entries are ordered correctl

   return 0;
}

(Display gets a bit confused there and prints the friend column x twice, once as x and once as t2.x, but that’s the same column, it’s a bug I just discovered, will fix asap)

Cheers,
Enrico

Thank you so much in this way it works smoothly!

Anyway, I am still wondering: is there is a way to perform this operation without relying on the index variable (that actually I do not have)?

I think that not only it would be more straightforward but also in my case, I do not have such a variable so basically I reproduce what you did by copying my all tree in another one on the same file adding one column with the _rdfentry and then I use this one as my main tree and I delete the old one. Finally, I create the new tree and then add it as a friend. So in total, I have to work with 3 trees and from the brief tests I did, it seems a quite heavy operation.

That said, now it works so I could just make do with it.

You can:

  • produce the friend tree in a single-thread run
  • write a shuffled friend and use a column shared between the main tree and the friend as index
  • read the original tree and write everything out in a new tree (so you don’t need any friends, you just take one tree and from it you produce another one with more columns)

In the future we might provide a switch to make ordered multi-thread Snapshots but it’s not clear at the moment what the cost will be in terms of memory usage.

If you can think of a better solution I’m interested :smiley:
Cheers,
Enrico

1 Like

Yes, I think that it would be very handy!
I will promptly report any idea I might have :smile:

Thanks for all your help,
Massimo