Merge data of multiple .root files into a single one

Hi everyone,

I recently started working with Root and I am trying to plot the data of several .root files into a single histogram.
I have managed to do it for a single .root file but I don’t really know how to proceed for multiple ones.

Here is the idea:

I read the .root file containing events and extract the time of all the events for a specific day (here 01/06/2008).

[code]
TFile file(“Data/2008/06/2008_06_01.root”, “READ”);
TTree tree = (TTree)
file.Get(“recData”);
RecEvent *recevt ;
tree->SetBranchAddress(“event.”, &recevt);
int N = tree->GetEntries();
int time_of_evt;

for (int i = 0; i < N; i++)
{
tree->GetEntry(i);
SDEvent sdevt = recevt->GetSDEvent();
time_of_evt = sdevt->GetGPSSecond();
}[/code]

Finally, I plot an histogram based on the time of event for this particular day.

What I’d like to do is to plot the time of events of every day of a specific month into one single histogram.

Being new to this kind of procedure, I don’t know if the best thing to do is to merge all the .root files of a month into a single one or to open each .root file within a loop and plot the data of each file into a single histogram.

Thank you very much :slight_smile:

You could for example use a TChain: https://root.cern.ch/doc/v608/classTChain.html
If your files are small, you could also use the hadd tool to merge the root files before processing.
Or process files one by one and then use hadd to merge the resulting histograms.

And instead of SetBranchAddress, you could have a look at the more modern way: https://root.cern.ch/doc/master/classTTreeReader.html

What do you call small? Each .root file is around 13 Mo, is it small enough?

I will look into this hadd tool and see if I can manage to get anything out of it, thank you :slight_smile:

Hi,

See also the new command line utilities

Cheers, Bertrand.

[quote]If your files are small, you could also use the hadd tool to merge the root files before processing.
Or process files one by one and then use hadd to merge the resulting histograms.[/quote]

Merging the files didn’t work very well. But I’m sure that I can easily merge the histograms. Now, what I was wondering is what is the fastest way to process several files? For a month of data, I have 30/31 .root files called “2008_06_XX.root”. Is there a way (such as a loop) that will allow me to create an histogram for each file?

Changing the script in order to treat each day individually is a very long process when you have several years of data…

Thank you :slight_smile:

Try something like this: int year = 2008, month = 6; TChain chain("h42"); for (int day=0; day<31; ++day) { TString filename = TString::Format("%d_%02d_%02d.root", year, month, day); if (gSystem->AccessPathName(filename.Data(), kFileExists)) { chain.Add(filename.Data()); } }
Cheers, Bertrand.

Hi Bertrand,

thank you very much for your help. The script you sent me is more or less what I am trying to do, I don’t really understand the last two lines.

Using part of your script, is it possible to replace the TFile in the following script by the name of the file specified in filename?

[code]for (int day = 1; day < 31; day++)
{
TString filename = TString::Format("%d_%02d_%02d.root", year, month, day);
TFile file(“Data/2008/06/filename”, “READ”); // SOMETHING LIKE THAT?
TTree tree = (TTree)
file.Get(“recData”);
RecEvent *recevt ;
tree->SetBranchAddress(“event.”, &recevt);
int N = tree->GetEntries();

	// execute the rest of the script to plot histogram.

[/code]

Thank you again!

Something like this:

int year = 2008, month = 6; TChain chain("2008_06"); for (int day=0; day<31; ++day) { TString filename = TString::Format("Data/%d/%02d/%d_%02d_%02d.root", year, month, year, month, day); if (gSystem->AccessPathName(filename.Data(), kFileExists)) { chain.Add(filename.Data()); } } TTree *tree = (TTree*)chain.Get("recData"); RecEvent *recevt = 0; tree->SetBranchAddress("event.", &recevt); [...]
See the TChain documentation

[color=#0000FF]EDIT:[/color] And for individual file access:

TString filename = TString::Format("Data/%d/%02d/%d_%02d_%02d.root", year, month, year, month, day); TFile file(filename.Data(), "READ"); // would be better to use TFile::Open(filename.Data(), "READ"); [...]
Cheers, Bertrand.

Thank you Bertrand, I tried to run it but ended up with an error because of this line:

It says:
[ul]Error: Can’t call TChain::Get(“recData”) in current scope script.C:23:
Possible candidates are…
(in TChain)
(in TTree)
*** Interpreter error recovered ***
[/ul]

Any idea what went wrong?

TChain::GetTree()

Thank you very much Bertrand. Using this (the code you showed me to access individual files):

TString filename = TString::Format("Data/%d/%02d/%d_%02d_%02d.root", year, month, year, month, day); TFile file(filename.Data(), "READ"); // would be better to use TFile::Open(filename.Data(), "READ"); [...]

I was able to plot an histogram for each day. Now I think I can manage to merge them correctly and get one histogram for the all month.

I will now have a look at the the TChain class to understand this other way you showed me. Does the line

add the data of each file in a single one?

Thank you for your time, I really appreciate :slight_smile:

Please read the documentation of TChain and TChain::Add()
For your specific case, the code should look like this:

   Int_t year = 2008, month = 6;
   TChain chain("recData");
   for (int day=0; day<31; ++day) {
      // format the path/file name
      TString filename = TString::Format("Data/%d/%02d/%d_%02d_%02d.root",
                                          year, month, year, month, day);
      // make sure the file exists
      if (gSystem->AccessPathName(filename.Data(), kFileExists)) {
         // if it exists, add it to the chain
         chain.Add(filename.Data());
      }
   }
   TTree *tree = chain.GetTree();
   RecEvent *recevt = 0;
   tree->SetBranchAddress("event.", &recevt);
   [...]

Cheers, Bertrand.

Hi bellenot,

We are trying to fill a single histogram with the data from multiple root files as well! Here is the code we have so far:

    DIR *directory;
    struct dirent * file;
    char returnData[2048];
    TChain dataChain("All");
    directory = opendir("/Users/fritschlab/Desktop/Dossier/GREG_detector/3rdGREG_build/data/alpha");
    while ((file=readdir(directory)) != NULL) {
        //printf("%s\n", file->d_name);
        strcpy (returnData, file->d_name);
        //puts (returnData);
         //std::cout << returnData << " " << std::endl;
        if(strstr(returnData, "alpha")) {
            const char *returnDataC;
            returnDataC = returnData;
            dataChain.Add(returnDataC);
            std::cout << returnData << " " << std::endl;
        }
    }
    TTree *tree2 = dataChain.GetTree();
    tree2->Print();

“All” is the name of the tree which currently contains all of our branches. And we have confirmed the returnData variable each time it is updated is the name of each file we wish to link, however, when we try and access the chained “dataChain.GetTree()”, it gives an error that it is a NULL argument. Do we need to not pass simply the name of the file into dataChain.Add() ? Thanks so much for any advice you are able to give :slight_smile:

Hi,

First, check the file->d_name, is it correct? And why so many string and pointer copies? You could simply use std::string or TString. Then, make sure that all your .root files are valid, and make sure that you only add .root files in dataChain.Add()
(and please, next time, open a new topic instead of replying to a 4 years old one)

Cheers, Bertrand.

Hi Bertrand,

I am unsure how to check if file->d_name is correct, other than when I print the string version of it, I get the names of the files printing out like this :

0.345553_alpha.root
0.345556_alpha.root
0.030903_alpha.root
0.067893_alpha.root
0.026153_alpha.root
0.119433_alpha.root
0.067895_alpha.root
0.119429_alpha.root
0.048533_alpha.root
0.037763_alpha.root

which are the correct files I want to add to the chain. If I have my root macro file in the folder above the files I wish to “chain” could that be causing an issue?

Probably. Try to add the directory name in front of the file names, or move the macro in the same directory than your files…

Tried that, as well as printing out the current directory and confirmed that was not the issue. Also, changed the “.” to an “_” in the beginning of the file names to ensure this was not messing things up.

And did you try to open a single root file? What is inside?

Using ROOT 6.22/00 I get the same error using root files and trees that I know are good.
What I always do is work directly with the TChain (without needing GetTree), e.g.:

datachain.Print();  // although this prints each tree in the chain, instead of one printout for the chain
cout << datachain.GetEntries();  // this gives the correct total of events in the chain

and a loop over the events in datachain works as expected.

Can you open a Jira issue?