Appending RDataFrame

Hi experts,

I have just a simple question:
I have 3 RootDataFrame objects which I would like to append into 1 big RDF. How can this be done?

Thanks!

_ROOT Version: 6.24/06
Platform: Not Provided
Compiler: Not Provided


Hello,

You can use Snapshot to save each of these three RDataFrame objects into a separate ROOT file with the same tree name, then create another RDataFrame object from those three files and that tree name. This assumes the three initial RDFs corresponded to compatible datasets (same schema).

Here’s a Snapshot tutorial in case it can be helpful:

https://root.cern/doc/master/df007__snapshot_8C.html

Hi @yosse_andrean ,
depending on your exact usecase (how does your data look like? do you want to do an horizontal or a vertical concatenation of the data? would the concatenation involve Define'd columns? etc.) another alternative could be to simply specify the 3 input datasets as one single input to the RDF constructor:

RDataFrame("t", {"f1.root", "f2.root", "f3.root"})

Cheers,
Enrico

Hi @etejedor,

Thanks for the answer. Yes this is the solution I am currently using, though I am hoping if there is a solution that does not require making cache files. Was thinking something like pandas.concat(). But maybe the function does not exist yet (?).

Best,
Yosse

Hi @eguiraud,

Thanks for the reply. Please see my response below:

They have the same columns. And I want to do a vertical concatenation (stitching the event rows).

Yes it does.

Due to the particularity of the routine I am running, I cannot do this unfortunately. I have to treat f1, f2, and f3 differently and then merge them.

Best,
Yosse

Maybe DefinePerSample helps with that? It lets you Define columns differently for different input trees.

Otherwise, due to the fact that you need to concatenate Defined columns, intermediate Snapshots as @etejedor suggested are the way to go.

Cheers,
Enrico

1 Like

Thanks for the pointer to DefinePerSample. It might be what I need. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.