Binary files differ when TFile generated on different PCs

I am creating absolutely the same TTree obgects in TFile on 3 different PCs ( 2 athlons and 1 pentium) , all of which run RH 9 ( withh all current updates) and all 3 resultant root files are different when I check them with diff or md5sum.

Contents are the same though when I look at the data in TBrowser.

Why is binary representation different ?

Its not like I am comparing windows to linux, and even then, shouldn’t TFile generate same binary format files ?

Same root version, same file system. Why different ? Is there random number generator in TFile :wink: ?

I could have ignored this since data is the same inside the files, but it is convinient to check for data consistency by md5sum occasionaly to make sure files are not corrupted on different nodes.

Khamit

Hi,
date of the keys (TKey::fDatime) would be my guess.
Axel.

How to turn off this “feature” ???

[b]not only fDatime is responsible for files being different.

I have hacked TKey and TDirectory to set fDatime to some arbitrary date and I have created the same file on the same machine twice and they are different when checked with diff ? What the hack is going on ?

I want the files to be lean mean analysis working horses and they seeem to have whole bunch of not needed info in them. I would drop all gizmos and whistles and have just data in the files.

I wonder if many people would be interested in hierarchy of TFile Parent classes who would have only minimum set of features needed for cpu-cycle and disk space concerned people data analysis.

Anyway, how to make files binary identical irrespectively of time of creation and any other uknown causes?

Or how do I make sure that files are the same without looking at contents.
Some data member perhaps would tell me a unique ID constructed from real data sets only.

[/b]

[quote=“ardashev”][b]not only fDatime is responsible for files being different.

I have hacked TKey and TDirectory to set fDatime to some arbitrary date and I have created the same file on the same machine twice and they are different when checked with diff ? What the hack is going on ?

The only time dependent variables are the TDatime variables in TKey
and TDirectory

I want the files to be lean mean analysis working horses and they seeem to have whole bunch of not needed info in them. I would drop all gizmos and whistles and have just data in the files.

We are not planning to drop this feature. It is extremely useful in many cases.

I wonder if many people would be interested in hierarchy of TFile Parent classes who would have only minimum set of features needed for cpu-cycle and disk space concerned people data analysis.

This is totally irrelevant. The overhead in space/time is marginal

Anyway, how to make files binary identical irrespectively of time of creation and any other uknown causes?

Modifying TDatime to set the time to 0 should be sufficient. If not,
it means that some of your objects contain non initialized values.

Or how do I make sure that files are the same without looking at contents.
Some data member perhaps would tell me a unique ID constructed from real data sets only.

What data members?
We are planning to add eventiually an MD5 checksum. Space is already
reserved in the file header since version 4.00

Rene

[/b][/quote]

Here are two files with same data and same datime ( Set to zero)

can somebody tell me what makes them different?

I think there are no uninitialized variables since these files contain copy of the same tree. They are just subsets.

By the way, is there method in TFile to set date of the file and all objects in it to an arbitrary date ?

I think addition of md5sum data member would help only partially, since file may be corrupted in the header itself, but report proper md5 checksum anyway. I normally rely on OS to tell me date of creation/modification of a file. I could imagine people wanting to encode the date into the file itself. But then there should be option to not use it and have identical files.
subset2.root (820 KB)
subset1.root (820 KB)

The difference is also due to the UUID stored in each TFile and in each TDirectory, which is time and machine dependent.

– Fons