Documentation for ROOT's file format

Hello,

I am trying to write a minimalistic parser which can read the ALICE open data root files (in Rust). The files contain Trees where some of the branches are simple types and others are more complicated classes. I don’t aim at reconstructing those more complicated classes to their respective c++ objects. I am happy to just get my hand on the underlying data. Something similar to what one gets from TTree::MakeClass, where each branch ultimately points to a fundamental type or an array of a fundamental type.

So far, I managed to parse the following:

  • The file header (doc)
  • The logical record header (doc)
  • Decompress object data (doc)
  • Gaps between the entries (doc)
  • Somewhat the StreamerInfo, but I cannot make sense of it

My problem really is that I cannot make sense of the decompressed payload of each TBucket. How do I figure out the structure of the encoded data? Could somebody point me to some documentation?

@sbinet, your go library has been a great resource for me so far, but I am stuck on this. From what sources did you work when writing that go library?

Thanks a lot!

1 Like

hi @christianb

glad go-hep/rootio was useful :slight_smile:

My main sources of inspiration (besides ROOT, of course) were:

note that the canonical go-hep/rootio repo is now at https://go-hep.org/x/hep/rootio, ie:

see here for the top-level function that decodes the streamers:

and here is the full list of streamers that I can currently decode:

and, finally, the function that, given a StreamerElement, will generate a function that decodes a ROOT TBuffer and fill the data into a user-provided pointer-to-data:

hth,
-s

BTW, do you have a github repo where your Rust-based parser lives ?

My rust is rusty (ha!) but I’d like to give it a try :slight_smile:

Thanks for your reply! I always find it quite difficult to extract the bigger picture from other people’s code, but I hope to make some progress with your pointers! So, do I understand correctly that the StreamerInfo is a TList of StreamerElements where each element will tell me something to the effect of:

"The payload of the TBucket (branch) Tracks.Pt is of type TArrayF"?

My Rust parser is really in its early infancy at this point. But as a matter of fact, it is just one piece of the bigger puzzle to recreate some published results with the ALICE Open Data using Rust. At this point, I managed to drop all ALICE dependencies, but still depend on ROOT for the IO. I think I will publish the project with this configuration in the next week(s), since any pure Rust parser will be very rough around the edges for the time being. The issue with the ROOT based IO is that I can only use concurrency when reading from multiple files. This causes a large memory footprint and possibly worse IO performance. With rust, I hope to get concurrency over a single file (eg. by decompressing the TBuckets in a new thread while reading the next one from disk in parallel).

“The payload of the TBucket (branch) Tracks.Pt is of type TArrayF”?

yep.

your ALICE OpenData-with-rust project sounds fun!
I might do the same in Go :stuck_out_tongue:
(The only show stopper for me ATM, with some experience with the LHCb OpenData dataset, is the handling of TClonesArray: it’s missing right now in go-hep/rootio… And, knowing a bit the AliRoot data sets that I tried to use for O2+FairMQ (and my Go-based project: https://github.com/sbinet-alice/fer), I know it’s using them quite extensively…)

If you point me (privately?) to your github or gitlab Rust repo, you might get some PRs :slight_smile:

Thanks again! Just added you as collaborator to my repository. See the PM.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.