Objects Definition Libraries and merging/cutting ttrees

chamont · January 26, 2007, 10:30am

Hi,

I am maintaining an application which is selecting entries (via a tcut) from multiple root files (from a given ttree in all of them), and writing the selected entries in an output root file. I have not seen this exact use case in tutorials/examples, did I miss it ? Whatsoever, it is probably a very common task and I should have expert help here

The way we implement now is basically :

we load the shared libraries which defines the classes
of the objects stored in the input trees.
we build a TChain with all input data files
we use chain->Draw(…, cut) so to generate a TEventList
we ask an empty TTree clone from the chain
we get from the TChain and fill the TTree with the
element of the TEventList

A key step is 1), which turns to be complicated since we now have multiple root files, generated with multiple code releases. It becomes more and more difficult to trace back which root file requires which shared library.

Could we avoid loading the objects definition shared libraries ? All
the necessary information is not somewhere in the files streaminfos ?
A tool such as hadd apparently does not need to be said which kind
of objects is stored in the trees. The main difference in our use case
is that we want to tcut the events at the same time we merge
the file/trees. Does it make the step 1) absolutely necessary ?

brun · January 26, 2007, 11:47am

Hi David,

Concerning your point 1: You need to provide your class library only if
your objects constructors allocate some other objects or dynamic basic types
and your destructors delete them (or you will get a memory leak).

You can use TTree::CopyTree (works also for a TChain) to do what you want.
See also tutorials/tree/copytree3.C

TChain chain("TreeName"); chain.Add(...) TFile *newfile = TFile::Open("result.root","recreate"); TTree *newtree = chain.CopyTree("your cuts"); newtree->AutoSave();
Rene

chamont · January 26, 2007, 5:13pm

Hi David,

Concerning your point 1: You need to provide your class library only if
your objects constructors allocate some other objects or dynamic basic
types and your destructors delete them (or you will get a memory leak).

Arg. That means that with my entries which are kind of events, with collections of collections of collections of objects, there is no hope to work without the GLAST classes libraries. As a unhappy consequence, since we have many files generated for months/years, we cannot avoid maintaining some meta-data saying which ROOT files have been made with which release of the class libraries, true ?
As another consequence, since each class library has been compiled versus a given release of ROOT, I must use this same ROOT release so to skim my correspondings ROOT file, or I will be unable to load safely my class libraries, true again ?

Currently, we use a given hardcoded ROOT 4 release for all the data files made with any ROOT 4 release, and another given hardcoded ROOT 5 release for all the data files made with any ROOT 5 release. It seems to work, but I feel not so at ease with this strategy. Also it is frustating that we cannot skim the old files with recent ROOT. That results in some sort of backward incompatibility.

You can use TTree::CopyTree (works also for a TChain) to do what
you want.See also tutorials/tree/copytree3.C
Code:
TChain chain(“TreeName”);
chain.Add(…)
TFile *newfile = TFile::Open(“result.root”,“recreate”);
TTree *newtree = chain.CopyTree(“your cuts”);
newtree->AutoSave();

Actually, my real use case is a little more complex. I have several kinds of trees, stored in sevral kinds of files. I want to select the events of interest througth a TCut for a single kind of tree, then I want to extract the events of interests from all the trees of any kind. This is why I am proceeding with two steps : I am first selecting the events, then I am skimming all the ROOT files, one data kind after the other.

Perhaps, you would suggest that I try to make all these different kind of trees connected as friends, and I could call the chain.CopyTree once for all ? Would it be more efficient ?

brun · January 26, 2007, 6:37pm

David,

If your Tree is correctly organized with a correct use of the split mode,
you should not see any problem. The ROOT automatic class schema evolution will take care of processing a chain of files produced with your old classes, and using your latest version of the classes in memory.
The class schema is stored in each ROOT file. You do not have to maintain your own schema. This would be a nightmare and it will not work.

[quote]As another consequence, since each class library has been compiled versus a given release of ROOT, I must use this same ROOT release so to skim my correspondings ROOT file, or I will be unable to load safely my class libraries, true again ?
[/quote]

Not true, see above

[quote]Currently, we use a given hardcoded ROOT 4 release for all the data files made with any ROOT 4 release, and another given hardcoded ROOT 5 release for all the data files made with any ROOT 5 release. It seems to work, but I feel not so at ease with this strategy. Also it is frustating that we cannot skim the old files with recent ROOT. That results in some sort of backward incompatibility.
[/quote]

I do not understand this. You should be able to read your data written with ROOT4 using the current ROOT version, otherwise your system will not scale over the coming years.

[quote]>You can use TTree::CopyTree (works also for a TChain) to do what

you want.See also tutorials/tree/copytree3.C
Code:
TChain chain(“TreeName”);
chain.Add(…)
TFile *newfile = TFile::Open(“result.root”,“recreate”);
TTree *newtree = chain.CopyTree(“your cuts”);
newtree->AutoSave();

Actually, my real use case is a little more complex. I have several kinds of trees, stored in sevral kinds of files. I want to select the events of interest througth a TCut for a single kind of tree, then I want to extract the events of interests from all the trees of any kind. This is why I am proceeding with two steps : I am first selecting the events, then I am skimming all the ROOT files, one data kind after the other. [/quote]

This looks OK. In the future you may want to consider an alternative using the new class TEntrylist, but we will need more details about your setup to come with a more precise recommendation.

see my previous remark.

Rene

pcanal · January 26, 2007, 7:07pm

[quote]we cannot avoid maintaining some meta-data saying which ROOT files have been made with which release of the class libraries, true ? [/quote]An ‘equivalent’ piece of information is stored with the TStreamerInfo in each file; namely either the class version (if you use ClassDef) or a check sum representing the class layout.

So the schema evolution present in ROOT should be sufficient.

[quote]make all these different kind of trees connected as friends, and I could call the chain.CopyTree once for all[/quote]This is currently not supported …

Cheers,
Philippe

chamont · January 29, 2007, 10:25am

Thanks for all the detailed answers. Much material to think about.
Probably I’ll be back after brainstorming with other GLAST brains.