Question about how to use TStreamerInfo

eoltman · February 28, 2014, 6:49pm

Hello,
It has become necessary for me to create non-root binary files that have root classes written to the end using code similar like this :TBufferFile *b = new TBufferFile(TBuffer::kWrite) myObject->Streamer(*b) fwrite(b->Buffer(),1,b->Length(),fp);for each of myObject objects. (myObject is an instance of a TObject derived class and fp is a FILE pointer) I have managed schema evolution for my classes - I can read these binary files (using TBufferFile(TBuffer::kRead,size)) just fine even as my classes have evolved. No problems…yet

Its the future I’m worried about: My classes have root classes as members (e.g. TH1F, TF1, etc) that use automatic schema evolution. Since I have developed this non-root binary file w/ cern release 5.34.1, it occurs to me that in some future version of root, some of these root classes will change and I will not be read my binary files if I link with this future root - there are no TStreamer objects in my binary file…

I know that If I open a TFile and write all my objects, a TStreamerInfo TList will get generated and added to the TFile. I could write these TStreamer objects to my binary file…Here are my questions:
[ul]
[li]If I include the TStreamerInfo TList in my binary file will the binary file be “future proof?”[/li]
[li]How do I make this info known to root when it reads those objects w/ automatic streamers?[/li]
[li]Is there another way to get the TStreamInfo TList (rather then first writing to a TFile and then using TFile::GetStreamerInfoList()[/li][/ul]

I hope my problem description is clear. Thanks in advance!
Ed

pcanal · March 3, 2014, 9:25pm

Hi Ed,

[quote]It has become necessary for me to create non-root binary files[/quote]Out of curiosity, which format and for what reason/purpose/advantages?

[quote]for each of myObject objects. (myObject is an instance of a TObject derived class and fp is a FILE pointer) I have managed schema evolution for my classes[/quote]Due to the next question, I am bit worried . How is that case different from the ‘regular’ case? Why the difference?

[quote]there are no TStreamer objects in my binary file…[/quote]By definition they are no self describing and can not be read without ‘external’ information.

[quote]If I include the TStreamerInfo TList in my binary file will the binary file be “future proof?”[/quote]The good news is that the answer is (of course) yes

[quote]How do I make this info known to root when it reads those objects w/ automatic streamers?
Is there another way to get the TStreamInfo TList (rather then first writing to a TFile and then using TFile::GetStreamerInfoList()[/quote]This is implemented in TBufferFile and TMessage (in slightly different way), you can do the same:

To deal with the StreamerInfo you will need to implement 3 things
a) gather the list of streamerInfo used
b) store the used streamerInfo
c) at startup retrieve the previous stored streamerInfo.

For a), you can either use the TMessage class instead of TBufferFile
or implement your own class deriving from TBufferFile and implementing
ForceWriteInfo(TVirtualStreamerInfo info, Bool_t / force */)
and
TagStreamerInfo(TVirtualStreamerInfo *info)
and possibly
WriteObject(const TObject *obj)

See the respective implementation in TMessage for concrete examples.

For b), see TSocket::SendStreamerInfos(const TMessage &mess)

For c), see TSocket::RecvStreamerInfos(TMessage *mess)
(which show the few lines of code that must be executed (serially)
to load the StreamerInfo into ROOT core).

Cheers,
Philippe.

eoltman · March 3, 2014, 10:22pm

Hello Philippe,
Thank you for slogging through my post. To answer your question about non root binary files: Its a speed issue: The root file that this binary file replaced consisted of a TTree and a bunch of classes that get written at the beginning and the end of the run. The TTree had 6 independant variable length UInt_t arrays, each was any where from 0 to 15 words/record + a bunch of fixed length records. Most of the fixed lenth records varied slowly - at 20 Hz or less, but the UInt_t arrays were generated at much higher rates - up to about 100 kHz. We tried turning off compression (which helped) and fiddling with buffer sizes (which also helped), but kept running into a wall. We decided to abandon writing the TTree in favor of writing a tagged stream (bits 31-24 are a tag identifying the data channel and bits 23-0 are the data itself) as a simple binary file. Rather than having an extra root file for the objects to go along with our binary “tagged stream” TTree replacement, we decided to serialize the objects directly into this tagged stream file (first handful of bytes in file included offsets to the end of the file to where the object data reside - these can be queried quickly w/out having to read the entire file)

I’m a bit unclear about your worry - now I’m worried! You refer to “that case” and “regular case”. To be clear, by “that case” I assume you mean manually managed streamer (use “-” in linkdef file and write my own version-aware Streamer()) and by “regular case” you mean automatic streamer evolution (do NOT use “-” in lindef and let root generate a TStreamerInfo object for use in the case that the version in the file is < version in the executable that reads it. The manual Streamers are self-describing (e.g. if Version=1 do this, if version =2 do that…) whereas the “regular case” streamers are not - they need this “external informatio” - the TStreamerInfo… Am I right? or should start to worry too??

Anyhow, I will focus on the meat of your reply and see if I can implement a workable solution…

On a somewhat related issue -having to do with speed and TTree writing: Is there any possibility that somehow TTree writing can be multi-threaded? I understand it may be impractical to have multiple threads trying to write to a single file, but some of the stuff might be able to progress in parallel - e.g. filling or compressing baskets? To remind you, my group uses windows. Thanks!!

Ed

pcanal · March 4, 2014, 1:24am

[quote]You refer to “that case” and “regular case”. To be clear, by “that case” I assume you mean manually managed streamer (use “-” in linkdef file and write my own version-aware Streamer()) [/quote]By ‘that’ I meant whatever you are doing for ’ I have managed schema evolution for my classes’ … I was not sure whether you were using hand-coded streamer or something else. In general hand-code streamer are harder to maintain that relying on the automatic schema evolution and i/o rules.

By self-describing, I mean the ability to read the data based solely in the content of the binary stream and/or file. By that definition, using hand-coded streamer is not self-describing as it requires the library to read the object. [Another problem is that with the hand-coded streamer there is no good way to know for sure that the data file was written with a rogue/broken streamer implementation].

[quote]The manual Streamers are self-describing (e.g. if Version=1 do this, if version =2 do that…) whereas the “regular case” streamers are not - they need this “external informatio” - the TStreamerInfo… Am I right? or should start to worry too??[/quote]For me, it is the opposite [i.e. the data stream is self-describing or not ; without StreamerInfo or when using custom streamer, it is not].

[quote]Is there any possibility that somehow TTree writing can be multi-threaded?[/quote]That can have indeed several meaning. Shortly (v5.34/18) you will be able to store data in distinct TTree in separate thread (more or one thread <-> one tree/one file). We have plan to indeed try to use additional thread/task to compress in parallel.

One thing you could consider in your environment is to use a TMemFile to prepare the data. Rather than writing the data to disk, this version of TFile write in a memory blocks. You could then take this memory block and tack it onto your binary data.

Cheers,
Philippe.

eoltman · March 7, 2014, 9:09pm

Hi Philippe,
I used TMemFile to prepare the data as you suggested in your previous post - It works very nicely and automatically includes the TStreamerInfo objects when I reverse the process. This is great - thanks!!
Ed

Barth · March 10, 2014, 3:33pm

Hi,

Would you mind sharing code snippets for the data preparation and the data read ?

Thank you,
Barth

eoltman · March 10, 2014, 4:54pm

Following are some snippets - apologies if its not c++ - I made some simplifications. The first argument in TMemFile is arbitrary - no file is actually created… Following is how I create the TMemFile and write to it.TObject *pObj; TMemFile *pMemFile = new TMemFile("memfile.root","new"); TIter next(MyListOfObjects); // a TList of objects to write while(pObj=next()) pObj->Write(); pMemFile->Write()Here is how a write the TMemFile object to a binary file:Long64_t maxSize = 50000000; char *pHeaders = new char [maxSize]; FILE *fpOut = File("file.bin","wb"); Long64_t nBytes = pMemFile->CopyTo(pHeaders,maxSize); fwrite(pHeaders,nBytes,1,fpOut); fclose(fpOut);Here is how I read the TMemFile block from binary fileFILE *fpIn=fopen("file.bin","rb"); _fseeki64(fpIn,0,SEEK_END); Long64_t size = ftell(fpIn); rewind(fpIn); char *pBuffer = new[end]; fread(pBuffer,size,1,fpIn); TMemFile *f = new TMemFile("memfile.root",pBuffer,size) f->ls(); // display objects!

Barth · March 10, 2014, 5:24pm

Excellent, thank you very much

Barth · March 12, 2014, 2:34pm

Hi again,

I can use succesfully the method you propose. However, I am surprised by the large amount of memory it uses. Basically, it always uses 2097152 bytes.

When using TMessage, I get a size that is proportional to the object.

See my example below

root [0] TH1 *hpx = new TH1F("hpx","This is the px distribution",100,-4,4);
root [1] TMessage::EnableSchemaEvolutionForAll(true);
root [2]    TMessage mess(kMESS_OBJECT);
root [3] mess.WriteObjectAny(hpx, hpx->IsA());
root [4] cout << mess.Length()
967(class ostream)253525691616
root [5] mess.Length()
(const Int_t)967
root [6] TMemFile *pMemFile = new TMemFile("memfile.root","new");
root [7] hpx->Write();
root [8] pMemFile->Write();
root [9] Long64_t maxSize = 50000000;
root [10]     char *pHeaders = new char [maxSize];
root [11]     Long64_t nBytes = pMemFile->CopyTo(pHeaders,maxSize);
root [12] nBytes
(Long64_t)2097152

Command [5] says 967 bytes for TMessage (no StreamerInfo here) and command [12] says 2MB for TMemFile.

Am I missing something ?
Is it “normal” ?

Thank you,
Barth

pcanal · March 12, 2014, 3:04pm

Hi Bart,

Yes, 2MB is the minimal allocation (block) size of TMemFile. This is not yet customizable but could be.

Cheers,
Philippe.

Barth · March 12, 2014, 3:33pm

Hi,

Ok, thank you for the explanation. I’ll make a request ticket.

Best regards,
Barth