Hi,
I am running the very short macro below to retrieve the number of events passing the .Filter().Define() commands from a large TNtupleD with many columns:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdio>
#include "TStopwatch.h"
int ntuple2pos_rdf_report(){
TStopwatch timer;
timer.Start();
std::ofstream partview;
partview.open( "toto.pos" );
ROOT::EnableImplicitMT();
// Input rootfile (24 GB) and TNtupleD names
auto fileName = "23Mg1p_N100000_R5.0_E0.25_MC10_NBNN_P1_T300.0_Euler_NBodyNN_He_1.000000.root";
auto treeName = "ParticleData";
// RDataFrame from tree with default variable name for the time
ROOT::RDataFrame rdf(treeName, fileName, {"t"});
// Sets cut and new variable definition
auto rdf2 = rdf.Filter("t > -10 && t < 10", "Cut") // <- no column name specified here, "t" taken as default!
.Define("v", "sqrt(vx*vx+vy*vy+vz*vz)");
// Loops over rows and store in ASCII file
rdf2.Foreach( [&partview] (double xi, double yi, double zi, double vi)
{ partview << "SP(" << xi << ", " << yi << ", " << zi << "){" << vi << "};" << std::endl;},
{"x", "y", "z", "v"} );
// auto nevts = rdf2.Count(); std::cout << "Particles left : " << *nevts << std::endl;
// auto allCutsReport = rdf.Report();
// // We can now loop on the cuts
// std::cout << "Name\tAll\tPass\tEfficiency" << std::endl;
// for (auto &&cutInfo : allCutsReport) {
// std::cout << cutInfo.GetName() << "\t" << cutInfo.GetAll() << "\t" << cutInfo.GetPass() << "\t"
// << cutInfo.GetEff() << " %" << std::endl;
// auto nevts = cutInfo.GetPass();
// partview << "Particles left : " << nevts << std::endl;
// }
partview.close();
timer.Stop();
printf("RT = %7.3f s CPU = %7.3f s\n", timer.RealTime(), timer.CpuTime());
return 0;
}
ntuple2pos_rdf_report.C (1.9 KB)
When I uncomment either line 37 (for .Count() function call) or lines 39 to 47 (for .Report() and cut information retrieval) it doubles the computing time.
I would think the .Foreach() command at line 32 would store this information somewhere in the RDataFrame structure as this Foreach() command has already processed the whole data and applied the filter/cut.
Perhaps am I giving the list of commands in a someway weird order ?
Thanks for any help.
Please read tips for efficient and successful posting and posting code
_ROOT Version: 6.24
_Platform: Linux Ubuntu 20.04.3 LTS
_Compiler: gcc 9.3.0