Generate root file event by event using RDataFrame for unknown number of events

brencic · March 25, 2022, 8:31pm

Hey

I have to generate a root tree and don’t know how many events I will have to store at compile time. So I’m forced to add to the tree event, by event, or every n events.

I know how to do this in the “old ROOT”, but is it possible to generate trees event by event with RDataFrame as well?

Found this tutorial: ROOT: tutorials/dataframe/df002_dataModel.C File Reference

...
 ROOT::RDataFrame d(64);
 d.Define("tracks", genTracks)

But this is to generate root trees with RDataFrame when one knows how many event’s there will be.

So how can one do sth. along those lines:

ROOT::RDataFrame d(file_path + "/" + file_name) ;

while flag == True{
     d.add_event(event);
}

with RDataFrame;

Thanks!

ROOT Version: Latest

couet · March 28, 2022, 6:48am

I am not sure to understand your question, because, by definition, the number of event to be stored in a tree is not known in advance. Tree are filled “event by event”. Using TTree or RDataFrame is the same: the number of events (ie the number of line) in a tree is not fixed “at compile time”. Did you mean the number of variables (ie the number of columns)?

brencic · March 28, 2022, 7:08am

No I meant number of events.

OK so if adding event’s to TTree is the same as in RDataFrame then:

How would one add arbitrary number of the events (one by one) with RDataFrame formalism to a root tree?

Bit off back story: I have a binary file in non-standard encoding that has to be converted to ROOT trees. In the past we used tree->Fill for that, but we would like to switch to the RDataFrames if it’s possible.

couet · March 28, 2022, 7:10am

I think @eguiraud can help you.

eguiraud · March 28, 2022, 9:38am

Hi @brencic ,
I am afraid there is currently no easy way to do that with RDataFrame (feel free to open a feature request at Issues · root-project/root · GitHub).
You can of course produce a batch of entries at a time, and do that in a loop as needed, but I realize it’s not exactly what you are asking for.

Cheers,
Enrico

brencic · March 28, 2022, 9:45am

Aha Tnx.

1.) So what would be the most “optimal” way to do the batch of entries? Do you have an example in mind?

2.) I saw in the examples that there’s a way to auto load CSV to RDataFrame. In that case software I guess reads the CSV file line by line (entry by entry) while doing the conversion or am I missing sth.?

eguiraud · March 28, 2022, 9:53am

For 1., something like:

void produce_batch() {
  ROOT::RDataFrame(N).Define(...).Snapshot(...);
}

while (some_condition) {
  produce_batch();
}

where each Snapshot writes a new TTree (all trees can be in the same file if they have different names and you tell Snapshot to open the file with mode “UPDATE”, or you can write each tree in a different file). Then you can use all the trees in a chain or merge them together with hadd as a final step.

Good catch, that’s what I meant when I said there is no easy way you can implement an RDataSource that tells RDF to process new (empty) events until you got enough. You can use ROOT: tree/dataframe/inc/ROOT/RTrivialDS.hxx Source File as a starting point. Basically RDF will keep asking for “entry ranges” to the RDataSource via the GetEntryRanges method until it returns an empty vector. So you can implement YourDataSource::GetEntryRanges as something that returns non-empty ranges until some condition is satisfied. This plugs into the usual RDF machinery, including multi-threaded Snapshots.

brencic · March 28, 2022, 10:28am

1.) OK yeah little wired hack.

2.) Well that’s doesn’t sound too bad

So basically one has to define a “custom” ROOT::RDF::RDataSource type object and create a method of type ROOT::RDF::Make(file_format)DataFrame.
Write the line by line parser and call it in the constructor (as here)
Specify custom ::GetEntryRanges() and custom ::SetEntry().
Anything else to watch for?

I think that I get the basic idea, but if you have any more tips, please let me know.

eguiraud · March 28, 2022, 11:46am

I’m not sure what line parsing you might need, given your example in the first post the constructor of your RDataSource only needs to take whatever arguments are required to evaluate flag.

For everything else the RTrivialDS and the related tutorials should provide a reasonable starting point.

Cheers,
Enrico

brencic · March 28, 2022, 2:01pm

Alright. Tnx for the help.

Will circle back if I get stuck