Dear experts,
When computing the sum of gen weights for a set of files in a TChain with a friend chain using RDataFrame I’m getting this message:
Error in TTreeCache::FillBuffer: Inconsistency: fCurrentClusterStart=0 fEntryCurrent=85705 fNextClusterStart=103062 but fEntryCurrent should not be in between the two
Also, if I compare the results with my loop-style computation using SetBranchAddress I get different numbers. Not sure what’s going on. I have tried setting the chain cache size to 0 as per https://cdcvs.fnal.gov/redmine/issues/22583 but it didn’t make any difference.
Here’s the relevant code for both ways:
SetBranchAddress and loop over events:
TChain * data_gen = new TChain("ntuples_gbm/genT");
// Add files to gen TChain
data_gen->SetBranchStatus("*",0);
data_gen->SetBranchStatus("gen_wgt", 1);
Float_t gen_wgt;
data_gen->SetBranchAddress("gen_wgt", &gen_wgt);
data_gen->GetEntries();
Float_t sum_gen_wgt = 0;
for (int i = 0; i < data_gen->GetEntries(); i++) {
data_gen->GetEntry(i);
sum_gen_wgt += gen_wgt;
}
cout << "sample: " << sample << ", sum_gen_wgt: " << sum_gen_wgt << endl;
RDataFrame with .Sum:
TChain * data_reco = new TChain("ntuples_gbm/recoT");
TChain * data_gen = new TChain("ntuples_gbm/genT");
// Add files to both
data_reco->AddFriend(data_gen);
ROOT::EnableImplicitMT();
chain->SetCacheSize(0);
ROOT::RDataFrame df(*data_reco);
auto df_wgts = df.
Define("Zwgt", calcZsf, {"gen_ID", "gen_pt"}).
Define("Wwgt", calcWsf, {"gen_ID", "gen_pt"}).
Define("Twgt", calcTsf, {"gen_ID", "gen_pt"}).
Define("PUwgt", calcPUsf, {"gen_pu_true"}).
Define("wgt", calcTotalWgt, {"Zwgt", "Wwgt", "Twgt", "PUwgt", "gen_wgt"});
auto df_sumgenwgts = df_wgts.Sum("gen_wgt");
cout << "Sum_gen_wgts as inferred from RDF: " << *df_sumgenwgts << endl;
Just as an example, I get sum_gen_wgt: 1.67772e+07 for a particular sample using the first way and 1.00094e+08 using the second.
Any help in figuring this out will be greatly appreciated!
Thank you,
Andre
ROOT Version: 6.18/00
Platform: x86_64 slc6
Compiler: gcc8