Issues with For each loop

Hi all,
I have some issues to understand why the way i wrote this is triggering me an error when doing the “handy” comparison afterwards.
I would like to fill each definition (and increase them to an arbitrary number) and fill up N vectors which are all parallel-per entry.
I ended up using ForEach but apparently the MIN as expression and hte extracted vectors are not aligned.
Is there a way i could make this work?
Cheers
Renato

	vector< pair< TString, TString> > variables = {  {"M1_PT", "Muon1PT"}, 
																							     {"M2_PT", "Muon2PT"}, 
																							     {"TMath::Min(M1_PT,M2_PT)" , "MinMuonsPT"} } ;  
	auto firstDF = df.Define( variables[0].second.Data() , variables[0].first.Data() );
	using DFType = decltype(firstDF);
	auto latestDF = std::make_unique<DFType>(firstDF);	
	int i=0; 
	for( auto var : variables ){
		if( i==0){ ++i; continue;}
		latestDF = std::make_unique<DFType>( latestDF->Define(var.second.Data(), var.first.Data() ));
		// if (i==0){ i++; continue;}
		//  dataf.emplace_back(  dataf[i-1].Define(var.second.Data(), var.first.Data())  ) ;
	}

	auto h1 = latestDF->Histo1D("Muon1PT");
	auto h2 = latestDF->Histo1D("Muon2PT");
	auto h3 = latestDF->Histo1D("MinMuonsPT");
	std::vector<double> _m1pt; _m1pt.reserve(eventMM.GetTuple()->GetEntries() ) ;
  latestDF->Foreach([&_m1pt](double value) { _m1pt.push_back(value); } , {"Muon1PT"} );
	std::vector<double> _m2pt; _m2pt.reserve(eventMM.GetTuple()->GetEntries() );
  latestDF->Foreach([&_m2pt](double value) { _m2pt.push_back(value); } , {"Muon2PT"});  
	std::vector<double> _Minm2pt;  _Minm2pt.reserve( eventMM.GetTuple()->GetEntries() );
  latestDF->Foreach([&_Minm2pt](double value) { _Minm2pt.push_back(value); } , {"MinMuonsPT"} );

  if( _m1pt.size() != eventMM.GetTuple()->GetEntries() ) cout<<"ERROR FILLING "<< endl;
  for(int i =0; i< _m1pt.size(); ++i){
  	if( std::min( _m1pt[i], _m2pt[i]) != _Minm2pt[i]){ cout<< "ERROR , scrambling happening " << std::endl;}
  }

_ROOT Version:v6/14
Platform: Mac
Compiler: Not Provided


Hi,
are you running this on a single thread or multiple threads?

Hi, I have at the beginning ,

	ROOT::EnableImplicitMT();

so I think i am running multi-threaded

Each Foreach runs a separate event loop.
Multi-thread event loops read entries in a scrambled order (each thread processed a bunch of entries concurrently to the others).

A rewriting of that logic that performs just one event loop, and therefore fills the vectors in sync:

auto df2 = df.Define("MinMuonsPT", "TMath::Min(M1_PT,M2_PT)");

auto h1 = df2.Histo1D("M1_PT");
auto h2 = df2.Histo1D("M2_PT");
auto h3 = df2.Histo1D("MinMuonsPT");

auto _m1ptPtr = df2.Take<double>("M1_PT"); // returns a smart pointer to vector<double>
auto _m2ptPtr = df2.Take<double>("MT_PT");
auto _Minm2ptPtr = df2.Take<double>("MinMuonsPT");
// first usage of one of these smart pointers to std::vector<double> will trigger the event loop

Note that the elements of the vectors will still be scrambled w.r.t. the input tree – this is a required tradeoff to make multi-thread event loops worthy performace-wise.

Hi @eguiraud, thanks I was thinking the same altough I was under the assumption that the Multi-Threading was something executed per-slices of the input tuple, therefore, one could have collected for each thread a piece of the vector and finally merge them in order by threads.

Is this something unreasonable to do ?

Hi @eguiraud,
Thanks a lot for this, I can now get this working without shuffling entries also multi-threaded.

Here the snippet :


int main(){

	ROOT::EnableImplicitMT();
	ROOT::EnableThreadSafety();
	ROOT::RDataFrame df(*eventMM.GetTuple());	 
	std::cout<<"start"<< std::endl;	

	// auto  latestDF = std::make_unique<DFType>(df); 
	vector< pair< TString, TString> > variables = {  {"M1_PT", "Muon1PT"}, 
																							     {"M2_PT", "Muon2PT"}, 
																							     {"TMath::Min(M1_PT,M2_PT)" , "MinMuonsPT"} } ;  
	auto firstDF = df.Define( variables[0].second.Data() , variables[0].first.Data() );
	using DFType = decltype(firstDF);
	auto latestDF = std::make_unique<DFType>(firstDF);	
	int i=0; 
	for( auto var : variables ){
		if( i==0){ ++i; continue;}
		latestDF = std::make_unique<DFType>( latestDF->Define(var.second.Data(), var.first.Data() ));
	}

	auto _m1pt =  latestDF->Take<double>  ( "Muon1PT" );
	auto _m2pt =   latestDF->Take<double> ( "Muon2PT" );
	auto _Minm2pt =  latestDF->Take<double> ( "MinMuonsPT" );

	std::cout<< _m1pt->size() << std::endl;

      if( _m1pt->size() != eventMM.GetTuple()->GetEntries() ) cout<<"ERROR FILLING "<< endl;
      for(int i =0; i< _m1pt->size(); ++i){
  	 if( std::min( _m1pt->at(i), _m2pt->at(i)) != _Minm2pt->at(i) )  { cout<< "ERROR , scrambling happening " << 
    std::endl;}
  } 

I do not get any ERRORS around, i am just wondering the following , with the Take are all vectors now filled at same entry with the same input tuple entry ?

Is this something unreasonable to do ?

Yes it’s doable. For some operations it might require more memory, and for all operations it would be slower, so RDF does not do that. If you absolutely need the output entry order to be the same as the input entry order, you will have to run RDF without EnableImplicitMT.

I do not get any ERRORS around, i am just wondering the following , with the Take are all vectors now filled at same entry with the same input tuple entry ?

Yes, all vectors returned by different multi-thread Takes are filled in sync, when they are filled in the same event loop, like in this case. The vector entries will be shuffled w.r.t. the input ntuple, for the reasons discussed above.

Thanks @eguiraud, i just need the extracted vectors to be aligned with entries. So i guess the fact all vectors are filled in sync ensure that.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.