I have to generate a root tree and don’t know how many events I will have to store at compile time. So I’m forced to add to the tree event, by event, or every n events.
I know how to do this in the “old ROOT”, but is it possible to generate trees event by event with RDataFrame as well?
I am not sure to understand your question, because, by definition, the number of event to be stored in a tree is not known in advance. Tree are filled “event by event”. Using TTree or RDataFrame is the same: the number of events (ie the number of line) in a tree is not fixed “at compile time”. Did you mean the number of variables (ie the number of columns)?
OK so if adding event’s to TTree is the same as in RDataFrame then:
How would one add arbitrary number of the events (one by one) with RDataFrame formalism to a root tree?
Bit off back story: I have a binary file in non-standard encoding that has to be converted to ROOT trees. In the past we used tree->Fill for that, but we would like to switch to the RDataFrames if it’s possible.
Hi @brencic ,
I am afraid there is currently no easy way to do that with RDataFrame (feel free to open a feature request at Issues · root-project/root · GitHub).
You can of course produce a batch of entries at a time, and do that in a loop as needed, but I realize it’s not exactly what you are asking for.
1.) So what would be the most “optimal” way to do the batch of entries? Do you have an example in mind?
2.) I saw in the examples that there’s a way to auto load CSV to RDataFrame. In that case software I guess reads the CSV file line by line (entry by entry) while doing the conversion or am I missing sth.?
void produce_batch() {
ROOT::RDataFrame(N).Define(...).Snapshot(...);
}
while (some_condition) {
produce_batch();
}
where each Snapshot writes a new TTree (all trees can be in the same file if they have different names and you tell Snapshot to open the file with mode “UPDATE”, or you can write each tree in a different file). Then you can use all the trees in a chain or merge them together with hadd as a final step.
Good catch, that’s what I meant when I said there is no easy way you can implement an RDataSource that tells RDF to process new (empty) events until you got enough. You can use ROOT: tree/dataframe/inc/ROOT/RTrivialDS.hxx Source File as a starting point. Basically RDF will keep asking for “entry ranges” to the RDataSource via the GetEntryRanges method until it returns an empty vector. So you can implement YourDataSource::GetEntryRanges as something that returns non-empty ranges until some condition is satisfied. This plugs into the usual RDF machinery, including multi-threaded Snapshots.
I’m not sure what line parsing you might need, given your example in the first post the constructor of your RDataSource only needs to take whatever arguments are required to evaluate flag.
For everything else the RTrivialDS and the related tutorials should provide a reasonable starting point.