This is a rather philosophical question, related to how the ROOT physically handles data, but just related.
Let’s say we have events from an experiment, and to store them we create a TRun tree and TEventA tree. TEventA is a basic copy of the hardware data, and most users will work with a processed version, TEventC. Still, some users may want to work with TEventA or TEventB. How to approach data storage and distribution? I see to border cases:
- For each run we have one big file containing TRun, TEventA…C. If at some point we decide to make some further analysis, we add TEventD to this file, etc. Every user has to download the whole file.
Pros: Easy-to-understand layout of files on the HDD - every run has one file with all the data. Easy to understand what tree comes from what tree inside the file.
Cons: everyone has to download a huge file, where most of the information may not be needed. Every time a new TEventX level is created, we modify the file, which is against the ROOT philosophy, from what I understand.
- Each tree type is in a separate file. Separate file for TRun, separate for TEventA, separate for TEventC, etc.
Pros: Everyone can download only what they need. A new file is created for each level of processing - no risk of data corruption.
Cons: We get a lot of file “types” (in the real, our case there are many more tree types). Difficult to connect the trees together as they are in separate files - perhaps through friends and a central TRun tree and file, but not sure how it works in such a case. Users always have to have at least two files to be able to work - TEventX and TRun. More difficult to track where the specific TEventX tree comes from.
As all this is related to how ROOT stores data and works with friends, I wonder if there is some kind of a standard of dealing with such a problem. I will be grateful for sharing your experience.