Dummy question on ```Take``` outcome

Hi,
I have a very basic question, let’s say i have done a set of DataFrame.Define() with custom expression and for each of them i do a Take() , let’s say i produce the vector _var1, var2, with the Take.

Are the _var1->at(i) and _var2->at(i) always pointing to the same initial tuple entry (i) ?

Thanks
Renato

Hi Renato,
with multi-threading enabled, the entries in the vectors returned by Take will be shuffled with respect to the input data, but will be consistent with respect to Take outputs produced by the same event loop.

Cheers,
Enrico

1 Like

Hi @eguiraud,
Let’s say i need to make some 2D operation and my very bad model is doing something like

vector<T> giveMeEntries( TChain* chain, TString _expr){
  RDataFrame df(*chain); 
  auto dd = df.Define( "MyObservable" , _expr.Data()); 
  auto _output = dd.Take<T>( "MyObservable") ; 
 vector<T> _out = std::move( *_output); 
  return _out;
}

And i do call thing like

varX= giveMeEntries( tuple , "obsx"); 
varY= giveMeEntries( tuple , "obsy"); 

How can i ensure varX->at(i) is linked to varY->at(i) . Is it possible to have this behaviour altough very sub-optimal ?
Is disabling MT allowing this?

Hi,
disabling MT guarantees that the input chain is always processed in the same order, so outputs will follow the same order.

With MT enabled, you can have giveMeEntries return a RResultPtr<vector<T>> instead of a vector<T> which would allow you to produce all outputs lazily, in a single event loop. This is much more efficient and guarantees that outputs will be ordered the same way (but shuffled w.r.t. the input chain).

Cheers,
Enrico

Thanks,
When you say always processed in the same order, does it mean that creating 3 times a new dataframe from scratch and taking 3 different columns, the resulting 3 vector are ordered in the same way?
I am totally aware this is a bad thing and one does not exploit fully the RDataFrame potentiality, but i am just trying to figure out if doing n-dataframes with same seed TTree and extracting for each one the corresponding n-observable vectors, i end up having n-observables vectors sorted differently or if the order is preserved (when MT is disabled)

When MT is disabled, RDataFrame just processes entries from beginning to end in the order they are served by the underlying TChain, so you would always get the same ordering in output, and it would correspond to the ordering of the entries in the input TChain.

When MT is enabled, input entries will be processed in shuffled order (each thread will process a bunch of entries, and the processing is racy) so each event loop will produce outputs as if the input entries where (kind of) randomly shuffled.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.