Strange behaviour of TTree::CloneTree()

smorgenstern · November 15, 2016, 11:07pm

Dear experts,

following the instruction of copying a tree I came across a very odd behaviour. Running the below code I see that beside my cloned output tree a partial copy of the input tree appears in the output file. This only happens if more than certain number of events is copied, which indicates some size issue, though the input file is just about 300 MB so I can’t imagine hitting some boundary. I also tested the procedure in the root console and observe the same. Is there any explanation for this, and if so how could this be solved?

file_input = ROOT.TFile.Open(inputfile,"READ")
tree_input = file_input.Get("Nominal/BaseSelection_tree_kinematicSelection")
file_output = ROOT.TFile.Open(os.path.join(outputdir,"test.root"),"RECREATE")
tree_output = tree_input.CloneTree(4800)
tree_output.SetName("egamma")
file_output.Write()
file_output.Close()
file_input.Close()

Just for completeness I tested this with version 6.04.14 and 6.06.08 on both mac os and slc6.

Many thanks,
Stefanie

nding · November 16, 2016, 4:54pm

Hi

I had the same problem before.

If you try, tree_output.Write( “”, ROOT.TObject.kOverwrite ), it would update the tree with less events to the normal one. You can also try kWriteDelete if kOverwrite doesn’t work.

It seems to be okay to have a duplicate of the original tree, many others experienced the same issue say the duplicate is a backup in case root crashes. If you try .ls in root, essentially only the original tree would be listed.

Hope that helps!
Nelson

pcanal · November 16, 2016, 5:04pm

Hi,

[quote]many others with the same issue say the duplicate is a backup in case root crashes. [/quote]Indeed but please note that this is a duplicate of only the meta-data (name, type and locations of the branches, etc.) and not of the data.

Cheers,
Philippe.

smorgenstern · November 16, 2016, 8:30pm

Hi Nelson and Philippe,

thanks for your replies but the problem though is a bit different. As the second tree is not a duplicate of the meta data but a partial copy of the input tree, i.e. containing both meta data and actual data. In fact I manipulate the cloned tree e.g. setting the name and adding branches (not included in the simple example code above). The second tree in the output file still has the initial name and doesn’t have the added branch.
Also .ls() lists both the actual cloned tree and the above described second tree.
Anyhow, I tried your suggestion with the kOverwrite and kWriteDelete option but the outcome remains unchanged.
Any further suggestions?

Cheers,
Stefanie

pcanal · November 16, 2016, 9:46pm

[quote], i.e. containing both meta data and actual data. [/quote]How did you arrive to this conclusion? I.e. are you sure that both TTrees are not loading the data from the same part of the file?

[quote]In fact I manipulate the cloned tree e.g. setting the name and adding branches (not included in the simple example code above). The second tree in the output file still has the initial name and doesn’t have the added branch. [/quote]This what I expect since in the code snippet you show, the ‘name’ of the TTree is changed only after all the entries have been cloned and consequently the TTree metadata has already been saved one or more time using the old name. Changing the name of the TTree does not affect the backup copies. (To change the name before hand use CopyEntries).

Cheers,
Philippe.

smorgenstern · November 18, 2016, 3:26pm

Dear Philippe,

thanks a lot for your reply. I see that the partial copy of the original tree has events stored. The number of events depends on the number of event requested for the cloning of the tree. Thus, I thought it is not just the meta-data. The number of events in this tree is also not predictable. I also observe that the occurrence of the partial input tree stored in the output file depends on the number of available branches, i.e. if I set the branch status to 0 for more branches than the original tree might or might not be in the output file depending on how many branches are switched off. Naively this indicates to me that there is some memory issue. However, if I compare the file sizes of the input file and output file I do not observe any significant difference if I try to clone the whole tree. From this I would guess that only links are stored in the original tree remnant. I also tried to use CopyEntries but do not see any difference. In principle I’m confident that the cloned and renamed tree has the correct entries, but it would be very good to understand why I see the remnant of the original tree, in particular as these files might be shared among analysts and others will certainly wonder why the additional tree occurs. Can the observed behaviour somehow be explained?

Many thanks,
Stefanie

pcanal · November 18, 2016, 3:37pm

HI Stephanie,

[quote] I also tried to use CopyEntries but do not see any difference.[/quote]This is surprising. Did you change the name of the output TTree before calling CopyEntries?

The frequency at which the backup copy of the meta-data is taken is based on output data size, so indeed reducing the number of branches would reduce the frequency.

[quote]Naively this indicates to me that there is some memory issue. [/quote]This is not the case here.

[quote]From this I would guess that only links are stored in the original tree remnant.[/quote]Yes, this is a good of expressing what I mean.

In addition, if you are using CopyEntries, you can disable the backup copying by calling:

Cheers,
Philippe.