Serialize and deserialize TTree into coral::Blob

Dear experts,

I attempt to store a TTree in a Coral::Blob object. For this I must first serialize the TTree, which works fine in python3 using pickle. Now I am a bit lost when it comes to how the C++ case. In fact my setup is such that I can write the blob either via python or c++ (though preferrably python), whereas the extraction MUST be done in C++. I couldn’t find any instructions on how to deserialize a pickle (or a TTree directly for that matter) in case it is not read from a binary file but a memory container, as is the case for a blob.

Is there an example on this, or would anybody have some advice if it even was possible to do it?

Thank you and regards,
heico

So basically the question is the following: in my understanding, one can serialize and deserialize any object in C++ following the steps described in: https://mobile.codeguru.com/cpp/cpp/algorithms/general/an-introduction-to-object-serialization-in-c.html What is unclear to me, though, is whether or not there are relevant serialization and deserialization methods already present in TTrees or TObjects for that matter. If not, I guess I would need to follow the second non-intrusive example on the site linked above. Unless, of course, there is a much better way to do this…

Any advice would be much appreciated.

Thank you,
heico

Hi @heico ,
we need our I/O expert @pcanal here.

For the new ROOT graphics I know we serialize C++ objects to json, so maybe @linev has some pointers about how we do that.

In general ROOT has its own serialization mechanism which is coupled with the ROOT file format, so I don’t know how well it can play with what coral expects.

Cheers,
Enrico

Hi,

If I uderstand correct, coral::Blob is just binary container in conditional data base coral.
Most easy way to store TTree in such container is just write complete binary ROOT file as is.
JSON is not supported for TTree class

Regards,
Sergey

What do you mean by work fine? The TTree has 2 part, the meta-data (list of branches and location of the data in a file) and the data itself. Consequently TTree has a complex structure, I am not sure how pickle is handling it.

What is unclear to me, though, is whether or not there are relevant serialization and deserialization methods already present in TTrees or TObjects for that matter.

Of course :). And actually the system can be use to serialize almost any C++ object, in a non-intrusive manner and without having to duplicate information. In the example you show you have to declare the data member (of course) and list them all in a serialization routine (duplicate information, you always need to update both). With ROOT I/O you only need to declare the data members (and setup the generation of a dictionary source file, once setup, it can be automatically refreshed). For the long answer see ROOTUsersGuide

I attempt to store a TTree in a Coral::Blob object.

As pointed by Sergey, the simplest is to use (or generate) a TFile (or TMemFile) that contains both the meta-data, data and self-description information. If you have a file simply copy the whole things into the blob. Or you can do something like:

TMemFile file:
TTree tree("tree", "title");
... code to file the tree ...
file.Write();
...
char *buffer = new char[file.GetSize()];
file.CopyTo(buffer, file.GetSize());

and put the buffer into the blob.

Now, TTree can become very very large (many Gigabytes), so that begs the questions, “what is the intent/reason for putting a TTree into a coral::blob?” and “Can that database sustains blob of the require size?” (I.e. maybe your TTree are small enough to fit)

2 Likes

(I think PyROOT plays tricks under the hood and makes it so that pickle actually uses ROOT I/O under the hood)

Dear Philippe, Sergey, Enrico,

Thank you all very much for your replies!

I now tried the following setup using a TMemFile, trying to do what Philippe suggested:

char* copyTreeCpp(const char* path, const char* tree){
    std::cout << path << "," << tree << std::endl;
    TFile* fin = TFile::Open(path);
    TTree* t   = (TTree*) fin->Get(tree);
    TMemFile fout("buffer","recreate");
    std::cout << "entries: " << t->GetEntries() << std::endl;
    fout.cd();
    t->CloneTree()->Write();
    fout.Write();
    char *buffer = new char[fout.GetSize()];
    fout.CopyTo(buffer, fout.GetSize());
    std::cout << "returning: " << buffer << std::endl;
    return buffer;
}

So currently, the tree is read from another TFile but that is not how it is going to be eventually (so I do not want to put the TFile fin directly into the blob because in the final setup it wouldn’t exist). Eventually, I will try to construct a TTree from other inputs and then serialize that one. For now, I just wanted to test the serialization with some tree that I have somewhere in a TFile to see if I manage to actually insert it into the blob (and how big the blob is going to be).

Unfortunately, this function only returns 'root' (i.e. the third cout writes returning: root) while the second cout gives me the proper number of entries. So I guess, the tree is not written to the TMemFile. Is the cd() function here not enough to set the internal pointer to the proper dir?

To address Philippe’s final question: my colleague and I are currently setting up a new (part of the) data base and for this we try to explore different formats and ways of how to serialize and store our data. So my goal for now is to try different approaches and then pick the one that makes most sense for the type and amount of data that we have (which is currently, since we are quite at the beginning, also not fully clear to us yet).

Thank you and regards,
heico

The result is a binary blob … not a string. doing std::cout << buffer simply stop printing at the first \0 in the buffer.

Your function must return both the pointer and the size (fout.GetSize()) otherwise you have no way to know how much binary data to put in the database blob.

To see if things worked, you can look at the value of fout.GetSize() or the output of fout.ls(); or fout.Map().

Cheers,
Philippe.

PS. Note that using:

t->CloneTree(-1, "fast")->Write();

will be much faster (avoid decompression, unstreaming, re-streamering and re-compression of the data).

Ugh… right. Luckily there is a facepalm emoji in this forum… :man_facepalming:

OK I’ll follow your advice and try to insert the blob with the proper size.
I’ll let you know once it worked or I got totally desperate.

Thank you!
heico

Dear all,
so just for completeness. Eventually I managed to insert the TTree into the coral Blob in the way that Sergey and Philippe proposed, both in python and in C++. There is a little sublety about the encoding (using base64 solves it though). I’ll mark the relevant post as solution for future reference.
So thank you once again for the help!
Cheers,
heico

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.