Piecewise serialization of TTree to FIFO

Dear all,
I would like to transmit a generic TTree entry by entry over a pipe (or TCP stream) for online data analysis.

Here is what I tried so far:

  • Just have root read from a fifo file. This does not work because root will skip the file as its size is zero. (Also, I suspect that root requires fseek capabilities on files, which fifos are sadly lacking.)
  • A LLM suggested using TBufferFile, but only produced code to serialize either the whole tree (not what I want) or manually transmit the fields of the tree (which I don’t consider “transmitting a tree”).
  • Transmitting a series of one-entry trees will likely have a tremendous overhead and is thus also disfavored.

What I am looking for would be a “WriteCurrentEntryToBuffer()” and “SetEntryFromBuffer” methods which do entry-wise IO with a TBufferFile, but I don’t find such a thing.

What would be the preferred way to implement this?

Cheers,
Philipp

ROOT Version: 6.30/04
Platform: gnulinux, arm64
Compiler: g++10


First, welcome to the ROOT Forum and Happy New Year!
Then, maybe @pcanal has an idea if (and how) that could be implemented

Indeed.

In order to find the best/optimal solution, we need to take one step back and understand what you are trying to achieve. Can you describe your complete setup (how does the data getting into the TTree in the first place, where does it go next, etc.) and why a pipe/FIFO is consider the right tool for the communication.

At the moment, we have an unpacker which can convert raw recorded data (served over TCP in real real time in detector specific formats which only the unpacker knows how to handle) into a TTree written to a root file.

To do a near line analysis, I could have a separate pyroot script which takes the file with the tree and creates some histograms.

However, this would mean that every time I wanted to look at the latest data, I would need to first run the unpacker, write a new .root file for a bit and then run the pyroot script on that newly written file.

It would be more elegant if I could have the unpacker continuously serialize ttree entries to some pipe as they come in, and the pyroot process process them as they are read from the pipe, so I can see the the data online.

Our unpacker also provides other output formats besides root files with a ttree, which support pipe communications, but parsing them from python would likely take some work (i.e. a custom C module). By comparison, reading in a TTree from a root file in pyroot is rather effortless.

The reason why I would favor pipes as a means of transmitting the data is that it seems to be the simple and extensible (e.g. if I ever need to put the pyroot process on a different computer, I can route the pipe communication through socat trivially). Another option would be to have the TTree in shared memory, but then I would need to use some semaphore to indicate if the tree is currently being written by the unpacker or being read by the python script, and yield execution for every event.

I would also not want to merge the unpacking step with the histogramming script – it would run counter to unix philosophy (do one thing), not reflect our organizational structure (Conway’s law) and require compromises on the correct languages for the respective jobs.

If there is some halfway simply way to drag the ttree through the pipe (e.g. “for every event, loop over the baskets, serialize them, then on the receiving side, update the basket contents with this trick”), I would try it, if it is more complex (“write a custom class inheriting from TFile”), then I would rather try to parse the other format from python.

While searching a bit more, I found TWebFile, but from my understanding this is used to fetch a pre-created file from a web server, which is not what I want.

If the same file system is reachable from both the unpacker and the near line analysis, you can also simply read the file directly while it is being written. This takes a few simple precaution/setup described in https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8 in the How to write a Tree in one process and view it from another process example.

If there is some halfway simply way to drag the ttree through the pipe …

There are. The next is to pick the right one dependency the relative size. How many top level branches are in the TTree? What kind of data types? What is the size of an entry? Does the transmission granularity ‘have to’ be 1 entry, can it be more (i.e. one cluster)?

Thanks for the link. I would still prefer something slightly more hacky if it got me around writing a big temporary file.

The tree I am interested in has about 1500 branches. It is an old ntuple style thing, containing no classes, but just integer leafs and integer array leafs of variable size. (However, if the transmission of containers of custom root classes was also possible this might be interesting for a different project.)

A typical entry is perhaps a few kilobytes (i.e., most of the arrays are empty), but rare atypical entries (e.g. a pulser event where every detector channel fired) might be multiple megabytes.

The granularity does not have to be one. I would be fine with either ‘once per second or per 1k entries, whatever comes first’ or ‘every 50 entries’, if a time-based update is not possible.

writing a big temporary file.

I am confused, I thought the current function of the unpacker was to write a (permanent) large file and that you needed to monitor its content. In my example, you would access this already being created file and monitor it (you would of course need to keep track of the number of element read/process so far in each iteration)

(However, if the transmission of containers of custom root classes was also possible this might be interesting for a different project.)

Totally doable, anything that can be ROOT I/O streamed can be transmitted. See the examples in $ROOTSYS/tutorials/net and the usage of TMessage. (eg. spy.C and spyserver.C)

TMessage is a TBufferFile with some additional features to help with transmission (easy compression, schema evolution tracking to allow connecting of process with different libraries versions). Feeding data into it is straightforward. For example:

      TMessage answer(kMESS_OBJECT);
      if (!strcmp(request, "get hpx"))
         answer.WriteObject(fHpx);
      else if (!strcmp(request, "get hpxpy"))
         answer.WriteObject(fHpxpy);
      else if (!strcmp(request, "get hprof"))
         answer.WriteObject(fHprof);

or

    TBufferFile b(TBuffer::kWrite);
    b << mynumericalvalue;
    b << myobjptr;
    // Can transmit b.Buffer() with b.Length() bytes.

The tree I am interested in has about 1500 branches.
or ‘every 50 entries’

With those numbers, the overhead is ‘only’ about a factor 2 when sending the whole tree.

root [0] TMemFile m("tester.root", "RECREATE")
(TMemFile &) Name: tester.root Title: 
root [1] TTree t("t","t");
^Oroot [2] double val;
root [3] for(int i = 0; i < 1500; ++i) t.Branch("val", val);
root [4] for(int i = 0; i < 50; ++i) t.Fill()
root [5] m.Write()
(int) 173
root [6] m.GetSize()
(long long) 2097152

The content of the TMemFile can send be send over the wire, loaded back into a TMemFile and read.

Alternatively you can just do (assuming here all branches contains doubles, as in the above example):

std::vector<double> values;
for(auto leaf : TRangeDynCast<TLeafD>( * mytree->GetListOfLeaves() )
   values.push_back( leaf->GetValue(0) );
TMessage m(kMESS_OBJECT); // or a TBufferFile directly
m << &values;

Okay, thanks for the detailed information!