For testing purpose, I would like to check that two files (with trees) contain the same data. I would like to check it fast, and would like to do it without loading the data definitions libraries which describe the objects in the files.
I made a try with the md5sum command on each file, but it always return a different result, even when the files are expected to contain the same data. I guess it comes from auxiliary data in the files, which depends on the file generation runtime, such as process ids etc.
Each time I run my test application, which writes the same data into a file toto.root, the command “md5sum toto.root” gives a different result. But perhaps I misunderstood the use of md5sum ?
md5sum is useless because the baskets have a date/time stamp.
My suggestion is
-to read the Tree headers T1 and T2
-compare the total size (compressed and uncompressed)
via GetZipBytes and GetTotBytes
-compare the number of entries.
If the 3 tests are ok, you have a very probability to have the same data.
Actually, I was wondering if such method as TTree::GetZipBytes, inherited by TChain, can be safely called for a TChain object. I guess the answer is no.
[quote]The uncompressed size differs also. It is more surprizing.
If the inside data is the same, I should end with the same
uncompressed sizes, should I ? [/quote]Not quite since the number of basket will be different.
When working with a tuple-like tree (the tree has only one level of branches, and all data types are ROOT ones), the size methods returns 0. Any idea about what is going on ?
TFile * f = TFile::Open(“the file name”) ;
TTree * t = f->Get(“the tree name”) ;
cout<<"%INFO: number of entries is “<GetEntries()<<endl ;
cout<<”%INFO: compressed size is “<GetZipBytes()<<endl ;
cout<<”%INFO: uncompressed size is "<GetTotBytes()<<endl ;
f->Close() ;
I tried to call TTree::Print(), Scan(), … but the sizes stay definitively 0.
A new “funny” effect with this code. In production, when dealing with so-expected big files, the call to GetZipBytes() is generating this error :
…
%INFO: number of entries is 19727939
Error: integer literal too large, add LL or ULL for long long integer /afs/slac.stanford.edu/g/glast/ground/DataServer/v3r5/src/Skimmer.cxx:857:
*** Interpreter error recovered ***
%INFO: compressed size is (class G__CINT_ENDL)153450224
…