Snapshot the results of each process with different tree names in a single file

Hi,

I am using RDataFrame to analyze different root files belonging to the different processes at the same time. Once I do the filtering etc… I would like to store the defined columns in a single file with different tree names associated with each process but I could not pass the different tree names to the snapshot. I have access to the different tree names via definepersample method from stored metadata but how can I systematically do the snapshot for each tree in the same manner that definepersample method do?
Thanks a lot for your help.

ROOT Version: 6.28
Platform: Not Provided
Compiler: Not Provided


Hello,

Thanks for the post.
Bear with me if I do not understand fully what you are trying to achieve: why specifying the name of the tree as a parameter for Snapshot is not enough for your use case?

Cheers,
Danilo

Hello Danilo,

Thanks for your answer.
This is exactly what I am looking for but I could not pass the names in parallel. Suppose several samples of different processes are analyzing in parallel and I want the defined columns of each individual process stored in the associated tree in parallel. Please let me know if I am still unclear.
Cheers,
Mohsen

Hi Mohsen,

Thanks: now this is clear.
Writing from multiple processes on the same file in a safe manner is not supported: some synchronisation on the user side is required.
In case you do not want to implement such synchronisation, you can always write N files and then merge them together very efficiently into one containing N trees with the hadd tool.

I hope this helps.

Best,
Danilo

Hello Danilo,

Thanks for your explanation. Is there any simple example of writing N files? I am especially interested in how to pass the file names, three names to the different snapshots depending on the input file that is processed.

Best
Mohsen

Hi Mohsen,

I do not think we have a ready to use example, however you can see here the documentation of the snapshot method, which is informative. It’s just a matter of passing the correct string to it to specify the file.

Cheers,
D

Dear @setesami ,

Thanks for reaching out to the forum! I guess what you are looking for is spawning multiple subprocesses, each launches one separate Snapshot operation which stores one TTree into one file, then you call hadd at the end as Danilo suggests.

Just to give you a starting point, in Python this would look something like

with multiprocessing.Pool(nworkers) as pool:
    pool.starmap(function_that_runs_snapshot, list_of_filenames)

Or in C++ you could implement a similar approach through ROOT: ROOT::TProcessExecutor Class Reference

Cheers,
Vincenzo