Running RDataFrame's Define In For Loop

Hello,

I am trying to define a few columns that can be parametrized. I want to use a for loop so I don’t have to type out each column manually. However I get the following compilation error.

Processing looptest.C+...
Info in <TUnixSystem::ACLiC>: creating shared library /home/kkrizka/PlotHelpers/sysvalid/./looptest_C.so
In file included from input_line_12:9:
././looptest.C:36:11: error: object of type 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' cannot be assigned because its copy assignment operator is implicitly deleted
      pidf=pidf.Define(ss_name.str(), ss_func.str());
          ^
/home/kkrizka/Sources/root/build-v6-14-08/include/ROOT/RDFInterface.hxx:130:16: note: explicitly defaulted function was implicitly deleted here
   RInterface &operator=(const RInterface &) = default;
               ^
/home/kkrizka/Sources/root/build-v6-14-08/include/ROOT/RDFInterface.hxx:110:35: note: copy assignment operator of 'RInterface<ROOT::Detail::RDF::RLoopManager, void>' is implicitly deleted because field 'fProxiedPtr' has no copy assignment operator
   const std::shared_ptr<Proxied> fProxiedPtr;     ///< Smart pointer to the graph node encapsulated by this RInterface.
                                  ^
Error in <ACLiC>: Dictionary generation failed!

I’ve attached a simple example that demonstrates this. looptest.C (1.2 KB)

The actual use case is to define several weights for systematics. They are stored in a std::vector and and I want to loop over them to add a few normalization factors.

Is reusing a dataframe variable possible? Any suggestions for how to handle this alternatively?


ROOT Version: 6.14/08
Platform: Linux
Compiler: GCC 8.2.0


1 Like

Hi,
We really need a tutorial for this: it’s a usecase that comes up often. You can’t reassign to a RDF variable, your best bet is a recursive function or re-assignment to a pointer to ROOT::RDF::RNode (with ROOT v6.16) of a heap-allocated copy of a RDF variable.

I can provide code samples tomorrow/the day after, ping me if I don’t write back soon! (There is a tutorial on RNode usage in the v6.16 tutorials though, that might help).

Cheers,
Enrico

Hi,
here’s one way to do it in v6.16:

auto latestDF = std::make_unique<RNode>(df);                                          
for (auto i = 0u; i < nDefines; ++i)                                                                             
   latestDF = std::make_unique<RNode>(latestDF->Define(names[i], expressions[i]));

The other option is using a recursive function:

auto ApplyDefines(RNode df, const std::vector<std::string> &names, const std::vector<std::string> &exprs,
                  unsigned int i = 0)
{
   if (i == names.size())
      return df;

   return ApplyDefines(df.Define(names[i], exprs[i]), names, exprs, i + 1);
}

A full, working example with method 1:

#include <ROOT/RDataFrame.hxx>
#include <TApplication.h>

using ROOT::RDF::RNode;

int main()
{
   TApplication app("app", nullptr, nullptr);

   ROOT::RDataFrame df(100);

   const auto names = std::vector<std::string>({"a", "b"});
   const auto exprs = std::vector<std::string>({"rdfentry_", "rdfentry_*rdfentry_"});

   auto latestDF = std::make_unique<RNode>(df);                                          
   for (auto i = 0u; i < names.size(); ++i)                                                                             
      latestDF = std::make_unique<RNode>(latestDF->Define(names[i], exprs[i]));                                     

   auto g = latestDF->Graph("a", "b");
   g->Draw();
   app.Run();
   return 0;
}

And with method 2 (I like it better, less pointers is always nice):

#include <ROOT/RDataFrame.hxx>
#include <TApplication.h>

using ROOT::RDF::RNode;

auto ApplyDefines(RNode df, const std::vector<std::string> &names, const std::vector<std::string> &exprs,
                  unsigned int i = 0)
{
   if (i == names.size())
      return df;

   return ApplyDefines(df.Define(names[i], exprs[i]), names, exprs, i + 1);
}

int main()
{
   TApplication app("app", nullptr, nullptr);

   ROOT::RDataFrame df(100);

   const auto names = std::vector<std::string>({"a", "b"});
   const auto expressions = std::vector<std::string>({"rdfentry_", "rdfentry_*rdfentry_"});

   auto dfWithDefines = ApplyDefines(df, names, expressions);

   auto g = dfWithDefines.Graph("a", "b");
   g->Draw();
   app.Run();
   return 0;
}

Hope this helps!
Enrico

2 Likes

Hi Enrico,

Thank you for the examples. They do answer my question.

The RNode is a very welcome feature!


Karol Krizka

1 Like

Hi @eguiraud, Is there any suggestion to make this recursive Define be working in Root 6/14 ?

I have here a “hacky” way to do that :

	std::vector< ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> > d_with_define;  
	d_with_define.push_back(  df.Define(variables[0].second.Data(), variables[0].first.Data() ) ); 
	for( auto var : variables ){
		if(i==0){ i++; continue; }
		names.push_back( var.second);
		expressions.push_back( var.first);
		d_with_define.push_back( d_with_define[i-1].Define( var.second.Data() , var.first.Data() ) );
	}

	// auto dfWithDefines = ApplyDefines(df, names, expressions);
	auto h = d_with_define.back().Histo1D( "Muon1PT");

Is that something Ok-ish to do?

Unfortunately v6.14 is missing a nice way to do this.

Yes, that will work as long as df has type RInterface<RLoopManager> (which is the type of the first RDataFrame created, but it will change as soon as you apply a Filter or a Range). The hackish part is that the code there relies on the return type of Define to be the same as the type of the dataframe variable it was apply to. This is true, and unlikely to change, but not necessarily guaranteed.

A slightly sturdier formulation might be:

using DFType = decltype(df);
auto latestDF = std::make_unique<DFType>(df);                                          
for (auto i = 0u; i < nDefines; ++i)                                                                             
   latestDF = std::make_unique<DFType>(latestDF->Define(names[i], expressions[i]));

auto h = latestDF->Histo1D("Muon1PT")

make_unique is c++14 but ROOT offers a c++11 backport in ROOT/RMakeUnique.hxx.

(I haven’t tested the snippet, but it should give you an idea)

Thanks . @eguiraud, my CMakeProject is in C++11 .

When i do :

#include "ROOT/RDataFrame.hxx"
#include "ROOT/RMakeUnique.hxx"
int main(){ 

//get TChain from somewhere
	ROOT::RDataFrame df(*eventMM.GetTuple());	 
	std::cout<<"start"<< std::endl;	
	using DFType = decltype(df);
	auto  latestDF = std::make_unique<DFType>(df); 
	vector< pair< TString, TString> > variables = {  {"M1_PT", "Muon1PT"} , {"M2_PT","Muon2PT"} , {"TMath::Min(M1_PT,M2_PT)" , "MinMuonsPT"} }; 							    ;  
	for( auto var : variables ){
		 latestDF = std::make_unique<DFType>(  latestDF->Define(var.second.Data(), var.first.Data()) );
	}                                      


	auto h1 = latestDF->Histo1D("Muon1PT");
	auto h2 = latestDF->Histo1D("Muon2PT");
	auto h3 = latestDF->Histo1D("MinMuonsPT");
} 

When compiling i get

/Users/lpnhe/root/build_root/include/ROOT/RMakeUnique.hxx:28:34: error: no matching constructor for initialization of 'ROOT::RDataFrame'
   return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
                                 ^ ~~~~~~~~~~~~~~~~~~~~~~~~
../../targets/testRDataFrame.cpp:126:20: note: in instantiation of function template specialization 'std::make_unique<ROOT::RDataFrame, ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void> >' requested here
                 latestDF = std::make_unique<DFType>(  latestDF->Define(var.second.Data(), var.first.Data()) );
                                 ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:42:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' to 'const ROOT::RDataFrame' for 1st argument
class RDataFrame : public ROOT::RDF::RInterface<RDFDetail::RLoopManager> {
      ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:42:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' to 'ROOT::RDataFrame' for 1st argument
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:51:4: note: candidate constructor not viable: no known conversion from 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' to 'TTree &' for 1st argument
   RDataFrame(TTree &tree, const ColumnNames_t &defaultBranches = {});
   ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:52:4: note: candidate constructor not viable: no known conversion from 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' to 'ULong64_t' (aka 'unsigned long long') for 1st argument
   RDataFrame(ULong64_t numEntries);
   ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:53:4: note: candidate constructor not viable: no known conversion from 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>' to 'std::unique_ptr<RDataSource>' (aka 'unique_ptr<ROOT::RDF::RDataSource>') for 1st argument
   RDataFrame(std::unique_ptr<RDataSource>, const ColumnNames_t &defaultBranches = {});
   ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:47:4: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
   RDataFrame(std::string_view treeName, std::string_view filenameglob, const ColumnNames_t &defaultBranches = {});
   ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:48:4: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
   RDataFrame(std::string_view treename, const std::vector<std::string> &filenames,
   ^
/Users/lpnhe/root/build_root/include/ROOT/RDataFrame.hxx:50:4: note: candidate constructor not viable: requires at least 2 arguments, but 1 was provided
   RDataFrame(std::string_view treeName, ::TDirectory *dirPtr, const ColumnNames_t &defaultBranches = {});
   ^
1 error generated.

Do you have any idea what is going wrong?

I guess I solved it :


	ROOT::RDataFrame df(*eventMM.GetTuple());	 
	std::cout<<"start"<< std::endl;	

	// auto  latestDF = std::make_unique<DFType>(df); 
	vector< pair< TString, TString> > variables = {  {"M1_PT", "Muon1PT"}, 
																							     {"M2_PT", "Muon2PT"}, 
																							     {"TMath::Min(M1_PT,M2_PT)" , "MinMuonsPT"} } ;  
	auto firstDF = df.Define( variables[0].second.Data() , variables[0].first.Data() );
	using DFType = decltype(firstDF);
	auto latestDF = std::make_unique<DFType>(firstDF);	
	int i=0; 
	for( auto var : variables ){
		if( i==0){ ++i; continue;}
		latestDF = std::make_unique<DFType>( latestDF->Define(var.second.Data(), var.first.Data() ));
		// if (i==0){ i++; continue;}
		//  dataf.emplace_back(  dataf[i-1].Define(var.second.Data(), var.first.Data())  ) ;
	}
	auto h1 = latestDF->Histo1D("Muon1PT");
	auto h2 = latestDF->Histo1D("Muon2PT");
	auto h3 = latestDF->Histo1D("MinMuonsPT");
	

works fine.

Quirks are due to differences in the exact type of the various dataframe variables. v6.16’s RDF::RNode solves precisely this problem: you can cast all dataframe variables to that common type and put them in vectors, declare pointers or references to them, etc.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.