Hello,
This must be a topic that other people have encountered, and this is my first post, so I’m putting it here in the hope that there’s a dead-easy fix out there.
I have a set of over 1000 input files.
Each file has a rather complicated TTree and they all have the same structure of TTree. This is data from an ATLAS Monte Carlo simulation.
I used “MakeClass” with the input files to generate the shell of code, and I use the TChain method to setup the file input using wildcards to identify every single one of the 1000+ files to be read in.
The code I have compiles and runs fine. It produces no errors and my output file contains filled histograms of every single quantity that I chose to store.
However, I have noticed a very strange behaviour which I think has to do with the way ROOT does memory management vs. writing what is in memory to an output file.
When I run the code on one input file and I ask to histogram the number of B hadrons in the leading jet in each event I get a histogram that has about 10k entries.
Here is the code where that file is identified:
{
// if parameter tree is not specified (or zero), connect the file
// used to generate this class and read the Tree.
if (tree == 0) {
TChain *f = new TChain("bTag_AntiKt4EMTopoJets");
//
// This will add the 12 files I uploaded as a test.
f->Add("/data/atlas/users/huffman/MCbTagwHITS/user.thuffman.bTagHitsTTree_Akt4EMTo/user.thuffman.18558259.Akt4EMTo._000335.root");
tree = f;
}
Init(tree);
}
Then I put in a wildcard in the filename so that it will run over 10 files.
So the filename is now user.thuffman.18558259.Akt4EMTo._00033*.root
and it works exactly as I would expect. My histogram of the number of B hadrons in every leading jet now has a bit over 100k entries since its run over 10 times more files and all the files are approximately the same size.
So next I add another line to the TChain which would include another 10 files, so that I am running over 20 files total. Again all are the same size. Here’s the code that adds the next 10 files.
testOutDev::testOutDev(TTree *tree) : fChain(0)
{
// if parameter tree is not specified (or zero), connect the file
// used to generate this class and read the Tree.
if (tree == 0) {
TChain *f = new TChain("bTag_AntiKt4EMTopoJets");
//
// This will add the 12 files I uploaded as a test.
f->Add("/data/atlas/users/huffman/MCbTagwHITS/user.thuffman.bTagHitsTTree_Akt4EMTo/user.thuffman.18558259.Akt4EMTo._00033*.root");
f->Add("/data/atlas/users/huffman/MCbTagwHITS/user.thuffman.bTagHitsTTree_Akt4EMTo/user.thuffman.18558259.Akt4EMTo._00053*.root");
tree = f;
}
Init(tree);
}
BUT now I only get 50k events when I plot that same histogram!
I would have expected to get something like 200k events!
I believe something strange is happening about when I choose to “write” the histogram.
I define the output *.root file in testOutDev.h where it is referred to as ‘fout’.
class testOutDev {
public :
TTree *fChain; //!pointer to the analyzed TTree or TChain
Int_t fCurrent; //!current Tree number in a TChain
//
// Open a file when you create a new bTagNTkJets1n2 object
// <BTH> I think you need to change directories to the file directory
// when you use the "Loop" method by putting in the line "fout->cd();"
TFile *fout = new TFile("moneyPlots/testBintOTT001.root","RECREATE");
// Fixed size dimensions of array or collections stored in the TTree if any.
// Declaration of leaf types
Int_t runnb;
Then right at the top of the “Loop()” definition in testOutDev.C I make sure I change directory so that the default directory is ‘fout’ using:
testOutDev::fout->cd();
Here is also where I define the histogram in question which I call ‘hist1nb’
I do not actually do a call to hist1nb->Write(); untill ALL of the files have been looped over.
I did this because I found that, if I put the “Write” before my instance of testOutDev completely finishes the Loop() method, I just get multiple copies (cycle numbers) of that histogram…one for every single file…
but I do not WANT that! I want only ONE histogram that contains ALL of the events…not 10 or 20 histograms with the same name that have one file’s worth of events in them.
Is there a way you can tell ROOT that it is time to move what is in memory into the output file and clear the memory such that it can accept more input?
Oh and just so you know, I did read Saving Histograms to Disk in the ROOT manual and it isn’t clear to me what I need to do in order to get my histograms properly filled with only one cycle number.
Actually, indeed, does cycle number even matter? If I have 1000+ cycle numbers of my ‘hist1nb’ histogram and I open the file that has them in ROOT interactively and type:
root> hist1nb->Draw();
will it actually give me a plot that is the sum all ALL those cycles? Or will I only get one of them? (I’m doing a test right now to see if that works)
Cheers and thanks!