Varint support and implementation

whit2333 · September 25, 2016, 2:37am

Hello,

This is sort of a follow up on ProMC in ROOT

ProMC uses google’s protocol-buffers which is quite useful because it has a “variable integer” (varint) type.

What is the best way to go about implementing this for support in ROOT? Specifically, we would want this to be used as seamlessly as possible. What is the current status in ROOT6 of nested containers as tree branches?

Cheers,
Whit

Danilo · September 25, 2016, 8:00am

Hello,

for the containers, I propose to refer to this thread: Nested containers in root tree branch .
I do not understand the usecase you are trying to cover. An interface between ROOT and ProMC? Do you need a “varint” in your data model? If yes, why?

Cheers,
Danilo

whit2333 · September 25, 2016, 6:10pm

Hi,

Yes, an interface to read ProMC files with little overhead is the goal. Maybe it can be done without varint, however, varint seems like a pretty good idea. The idea is to avoid storing lots of zeros. The mechanism for the data serialization is google’s protocol-buffers which uses a varint.

The basic strategy of ProMC is outlined here atlaswww.hep.anl.gov/asc/wikido … #why_promc

It would be nice to read a ProMC file (and events as a TTree) without having to blow up the varints (until they need to be really used).

The conversion method already exists, however, it is not ideal because the data is duplicated and increases in size.

Anyway, I am still trying to understand things and just wanted to ask.

Cheers,
Whit

Danilo · September 25, 2016, 8:41pm

Hi Whit,

I understand a series of utilities to read in ProMC files is already available also for C++: what is the problem of treating the events with ROOT once they are read in?
I agree that the ProMC2TTree conversion, as any other conversion, would be a less performant procedure.
Even if you are in an exploratory phase, do you know of any easily reproducible C++ benchmark (cut and paste at most) which shows that a particular data model can be persisted in a columnar format with ProMC

With a better final file size on disk with respect to ROOT/HEPMC3 (and/or ROOT used to persist HEPMC3)
With a more moderate CPU usage with respect to ROOT/HEPMC3 (and/or ROOT used to persist HEPMC3)
The reading part of such benchmark would be also interesting.

Cheers,
Danilo

Axel · September 26, 2016, 7:34am

Hi,

From the ProMC web page:

Every single sentence is incorrect

ROOT files are multi-platform. Maybe we have a different definition of “multi-platform”. ROOT’s files can be used (written, read) across any combination of any platform that’s currently in use, with the exception of PPC64be.

That’s about re-implementing the I/O layer in a different language (not multi-platform). FWIW we have a working implementation in JavaScript.

or LZMA.

Sorta. But sorta not really. Read up on this here: root.cern.ch/root/htmldoc/guide … double32_t

Not necessarily. What you could do is

create different branches representing different precisions (performant but more management overhead on your side)
create your own wrapper class around floating point with different precision, throw in a virtual function to enable ROOT to determine at runtime which precision you want. That’s likely less work for setting up the branches but more coding to embed this type.

One of the disadvantages of VarInts (real VarInts - I don’t know how far protobuf goes with the “var” part) is that their size is not determined during the data layouting phase, while ROOT’s bit flags allow ROOT to compute the size, and then jump to the bits it needs without having to deserialize everything in front (because each and every VarInt could have a different size).

Cheers, Axel.

whit2333 · September 26, 2016, 8:44pm

Hi Danilo and Axel,

Thanks for the replies.

Regarding the benchmarking, I do not know. I can ask the developer to follow up.

I am coming more from the nuclear physics community were we are not always up to date on the HEP software, much of which solves our problems.

I did not know about HEPMC3. I am confused because I cannot find a real website for it. Is it the same as HEPMC/HEPMC++?

Axel, you raised a lot of good points and corrections as I had similar thoughts when reading it.

One problem I have come across more than once is the data serialization problem between C++/ROOT and Java. While ROOT plays nice with most other languages, it would be nice if there was a way to use java or c++.

For example ProMC and LCIO seem to be a compromise being a persistency framework compatible with many languages, however, they are utterly useless wrt ROOT IO. I would not encourage students to start using java for many reasons, however, lots of tools exist for java that would be useful for the future. I am trying to identify the real missing points of contact for java to use root (like python uses root) but it would be really a lot of work.

Anyway, I think there is a real rift between the C++ and java programmers in HEP and Nuclear Physics that needs to be addressed.

Is a root file really that difficult to read in Java? Do we need to create a swig-like tool that runs like rootcling to make the java libraries? The source being the c++ code not some swig file? Can this go both ways?

Anyway, I have diverged and vented a little bit but would love to hear your thoughts.

Cheers,
Whit