Calculate the average histogram of various histograms

I have a root file containing many pulses in histogram format.
What I am trying to do is calculate the average pulse of those histograms.
The code I am using is the following

[code]#include “Riostream.h”
#include

void mean_Gamma_Flash(){
cout << “-------------------------------------------------------------------” << endl;
gROOT->SetStyle(“Plain”);
gStyle->SetOptStat(0000);
gStyle->SetOptFit(0000);
gStyle->SetOptTitle(0);

Int_t detNum=1;
Char_t dname[4]; sprintf(dname,"FIMG");
TString dir = gSystem->UnixPathName(gInterpreter->GetCurrentMacroName());
dir.ReplaceAll("mean_Gamma_Flash.C","");
dir.ReplaceAll("/./","/");
ifstream in;
Int_t channels = 10000;
Int_t numFiles = 0;
TH1F *hindiv = new TH1F("hindiv","Individual",channels,1,(channels+1));
//TH1F *htotal = (TH1F*)hindiv->Clone("htotal");
//TH1F *htotal = new TH1F("htotal","Total",channels,1,(channels+1));
TH1F *hmean = new TH1F("hmean","Mean",channels,1,(channels+1));
cout << "Opening root file" << endl;
TFile *MyFile = new TFile("FIMG5.root","READ");
if ( MyFile->IsOpen() ) printf("File opened successfully\n");
for (Int_t evntNum=1; evntNum<=305; evntNum++){
	MyFile->GetObject(TString::Format("FIMG5_EV_%d;1",evntNum), hindiv);
	cout << TString::Format("FIMG5_EV_%d;1",evntNum) << endl;
	//hindiv->Draw("same");
            TH1F *htotal = (TH1F*)hindiv->Clone("htotal");
	htotal->Add(hindiv,1);
	numFiles++;
}
cout << "Number of files processed : " << numFiles << endl;
htotal->Draw();

}[/code]

The previous code only adds the two last histograms. I can locate where the mistake is : I am cloning the sum histogram inside the for loop. However if I try to create it outside the for loop(either using Clone() or the default constructor), I cannot add thehistograms due to an error in dimensions

Any idea on how to overcome this?
Thank you very much in advance!

This isn’t a direct answer to your question, but my preferred method of doing averaging is to accumulate the TH1Fs in a TH2F, then you can see the distribution and outliers. You still should be careful about the histogram dimensions.

#include <vector>
#include "TH1F.h"
#include "TH2F.h"

TH2F *Average(std::vector<TH1F*> hists) {
   TH2F *hAvg = new TH2F("hAvg","Avg",100,0,100,100,0,100);
   for (size_t i=0;i < hists.size(); ++i) {
      for (int bin = 0; bin < hists[i]->GetNbinsX(); ++bin) {
         int resultBin = hAvg->FindBin(hists[i]->GetBinLowEdge(bin), hists[i]->GetBinContent(bin));
         int binContent = hAvg->GetBinContent(resultBin) + 1;
         hAvg->SetBinContent(resultBin,binContent);
      }
   }
   return hAvg;
}

You can then get the average by simply doing the following:

TProfile *avg_pfx = hAvg->ProfieX("avg_pfx")

Dear kSmith,

Thank you very much for your answer!
Indeed it doesn’t seem pretty straightforward to calculate the mean histogram.

The thing is that I don’t seem to understand how to use your code.
As far as I understand, I have to include your code into mine and then call your method.
I would expect to call it using something like

This means that I have to create a vector and fill it with the desired histograms, but I don’t know how to do that, since the desired histograms are part of a root file.

Any advice on how to use your method?

Thank you very much in advance!

There are a number of ways to adapt the code snippet. Maybe the easiest is the following:

//Your code for getting the file here
std::vector<TH1F*> hists; //Declare the vector of histograms
for (Int_t evntNum=1; evntNum<=305; evntNum++){
      MyFile->GetObject(TString::Format("FIMG5_EV_%d;1",evntNum), hindiv);
      hists.push_back(hindiv); //Push a new histogram into the vector
}

TH2F *hAvg = Avergage(hists); //Compute the avg.

PS I had made a mistake in the above Average function. I corrected it, but you should be sure you understand it to make sure it does what you expect.

Thank you very much for your post.
My code at the moment is the following

[code]#include “Riostream.h”
#include
#include
#include “TH1F.h”
#include “TH2F.h”

void mean_Gamma_Flash(){
cout << “-------------------------------------------------------------------” << endl;
gROOT->SetStyle(“Plain”);
gStyle->SetOptStat(0000);
gStyle->SetOptFit(0000);
gStyle->SetOptTitle(0);

Int_t detNum=1;
Char_t dname[4]; sprintf(dname,"FIMG");
TString dir = gSystem->UnixPathName(gInterpreter->GetCurrentMacroName());
dir.ReplaceAll("mean_Gamma_Flash.C","");
dir.ReplaceAll("/./","/");
ifstream in;
Int_t channels = 10000;
Int_t numFiles = 1;

TH1F *hindiv = new TH1F("hindiv","Individual",channels,1,(channels+1));
cout << "Opening root file" << endl;
TFile *MyFile = new TFile("FIMG5.root","READ");
if ( MyFile->IsOpen() ) printf("File opened successfully\n");
std::vector<TH1F*> hists;
for (Int_t evntNum=1; evntNum<=10; evntNum++){
MyFile->GetObject(TString::Format("FIMG5_EV_%d;1",evntNum), hindiv);
hists.push_back(hindiv); //Push a new histogram into the vector
cout << TString::Format("FIMG5_EV_%d;1",evntNum) << endl;
numFiles++;
}
cout << "Number of files processed : " << numFiles-1 << endl;
TH2F *hAvg = Average(hists); //Compute the avg.
hAvg->Draw();
TProfile *avg_pfx = hAvg->ProfileX("avg_pfx");
avg_pfx->Draw();

}

TH2F Average(std::vector<TH1F> hists) {
TH2F *hAvg = new TH2F(“hAvg”,“Avg”,100,0,100,100,0,100);
for (size_t i=0;i < hists.size(); ++i) {
for (int bin = 0; bin < hists[i]->GetNbinsX(); ++bin) {
int resultBin = hAvg->FindBin(hists[i]->GetBinLowEdge(bin), hists[i]->GetBinContent(bin));
int binContent = hAvg->GetBinContent(resultBin) + 1;
hAvg->SetBinContent(resultBin,binContent);
}
}
return hAvg;
}
[/code]

What I would expect is that the average histogram will have a similar shape as the individual ones, however this is not the case. Am I doing something wrong or is it something that I don’t really understand?

Without the input file it is hard to say. Perhaps it is an issue with the arbitrary dimensions of the TH2F that I proposed.

Do your input histograms typically take on X and Y values between 0 and 100?

The histograms from the root file contain values from 0-(~) 3000 in the X-axis and from 0-255 in the Y axis.
The input file can be found here

dl.dropboxusercontent.com/u/19828093/FIMG5.root

Note that I modified the 2D histogram’s dimensions and I still don’t get what I expect.

It looks like your horizontal axis values change for each histogram as well as the number of bins. You will need to change the averaging algorithm to ignore the values and just average each bin or you need to shift each histogram such they have the same x-axis values.

I chose the former and changed the FindBin line and produced this output:


The code I used to produce this

#include <vector>
#include "TFile.h"
#include "TH1F.h"
#include "TH2F.h"
#include "TProfile.h"

TH2F *Average(std::vector<TH1F*> hists) {
   TH2F *hAvg = new TH2F("hAvg","Avg",3000,0,3000,300,0,300);
   for (size_t i=0;i < hists.size(); ++i) {
      for (int bin = 0; bin < hists[i]->GetNbinsX(); ++bin) {
         int resultBin = hAvg->FindBin(bin, hists[i]->GetBinContent(bin));
         int binContent = hAvg->GetBinContent(resultBin) + 1;
         hAvg->SetBinContent(resultBin,binContent);
      }
   }
   return hAvg;
}

TH2F* scan() {
   gStyle->SetPalette(52); //Use Grayscale palette

   //Load File.
   TFile *f = new TFile("FIMG5.root");

   //Loop over each histogram and store pointer in vector.
   std::vector<TH1F*> hists;
   for (int i=0;i<305;i++) {
      TH1F *hIndv = (TH1F*)f->Get(Form("FIMG5_EV_%d",i));
      if (hIndv)
         hists.push_back(hIndv);
   }

   //Compute average
   TH2F *hAvg = Average(hists);
   TProfile *hProf = hAvg->ProfileX("hProf");

   //Turn off stats
   hAvg->SetStats(kFALSE);

   //Draw the 2D hist with the average prof on top.
   hProf->SetLineColor(kRed);
   hProf->SetMarkerColor(kRed);
   hAvg->Draw("COLZ");
   hProf->Draw("SAME");

   //Use a log scale in Z
   gPad->SetLogz();
   gPad->Update();

   return hAvg;
}

Clearly there is something that I don’t get.
In the following canvas, I have plotted all the histograms from the root file.

They are not shifted, indeed however you can clearly see that some pulses go beyong 0, to negatives values.
Those pulses should be also reproduced by your code, but they are not.
I cannot find why, so I am wondering if there is something that I don’t really get.

I think I located the error.
I just replaced FIMG_EV_%d with FIMG_EV_%d;1 and I get the following output

which is closer to what I expected apart from the saturated values in the right bins which don’t exist in the gray histogram

Ahh, yes you have multiple histograms stored with the same name. The number after the semicolon is the cycle number. The TFile::Get method grabs the key corresponding to the last cycle unless specified as you have done.

Good to know about “cycles”!!!

There are some other issues though.
I changed the number of bins( to 8000 from 3000) in order to see if I am narrowing the range, suspected that this may be the reason that I am missing the saturated pulses that occur in times greater than 14us in the blue histograms.
The thing is that not only they are still missing, but there must be an error computing the average at around bin number 6000.

One more question, is how do I get to export the the red average in an ascii file converting x-values from arbitary bins to time, as it was saved in the first histograms.

First, I do not believe those were saturations. Each histogram had a varying number of bins, once you reached this number the histograms you were plotting returned to zero, thus the vertical lines on the right side of the plot. These do not appear in the average plot as the function loops until the last bin is reached and does not add additional points at zero.

Secondly, what do you mean by an error calculating the average? The number of pulses with that length decreases and so the uncertainty in the average increases this is the reason for the thick red line around bin 5800.

Finally, a TProfile is derived from a TH1 so the usual methods for getting bin content and bin error are available as well as the print method. Unfortunately the time information is lost upon converting to bin number. If the axis was arbitrary as you stated then this shouldn’t be important. If you want to keep the time information you will still need some way of adjusting the time axis such that each histogram shares a common range.

I never thought that those vertical lines could be the starting point of the zeros!
Thank you very much for the enlightment!
It is very important to me!

Secondly, like you stated I have to find another way to get the average in order to keep the time information.
The thing is that I find the 2D histogram idea brilliant so I will try to stick with it!

Thirdly I would like to ask, if there is a way to get the average by ignoring values that are bellow a certain threshold in the z axis. What I mean is that I would like my average to be as close to the white part of the 2D histogram, so if I ignore the lower z-values when computing the average, the TProfile will be closer to the white part.

Sounds like you are looking for the most frequent value and not the average.

Something like the following:
EDIT: I have corrected the following code.

TH1F* GetFrequent(TH2F* hist2D) {
   TH1F* hFreq = new TH1F("hFreq","Freq",hist2D->GetNbinsX(),hist2D->GetXaxis()->GetBinLowEdge(1),hist2D->GetXaxis()->GetBinLowEdge(hist2D->GetNbinsX()) + hist2D->GetXaxis()->GetBinWidth(hist2D->GetNbinsX()));

   for (int xBin = 1; xBin <= hist2D->GetNbinsX(); ++xBin) {
      double maxYBin = 0;
      for (int yBin = 1; yBin <= hist2D->GetNbinsY(); ++yBin) {
         if (hist2D->GetBinContent(xBin,yBin) > hist2D->GetBinContent(xBin,maxYBin)) maxYBin = yBin;
      }
      hFreq->SetBinContent(xBin,maxYBin);
   }
   return hFreq;
}

You are implying to add this method in the code and just call it, right?

Yes

I used your method as the following code shows

//Compute average TH2F *hAvg = Average(hists); TProfile *hProf = hAvg->ProfileX("hProf");//Average histogram TH1F *hFreq = GetFrequent(hAvg);

However I get an error just by calling the method

root [0] .x mean_Gamma_Flash.C Error in <TH2F::GetBinLowEdge>: Invalid method for a 2-d histogram - return a NaN Error in <TH2F::GetBinLowEdge>: Invalid method for a 2-d histogram - return a NaN Error in <TH2F::GetBinWidth>: Invalid method for a 2-d histogram - return a NaN libGL error: failed to load driver: swrast Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1 *** TRemiHistExport: Histogram hProf written to mean_gamma.txt (class TH2F*)0x2be0d10

It was a suggested routine, I had not debugged it. I have corrected the previous post with code that works as expected.

Thank you very much for your time and effort!!!
Indeed now it’s working, but again there are some issues. Check for instance the following image

I was trying to find where the bug is, but then I realised that there is no bug in the code.
The problem is that the “Most Frequent Value” is not the right method for the first part of the pulses because there are many saturated pulses. That is why the blue line(most frequent value) has many deeps.

So I was wondering what is the best way to deal with it…
Is there a way to use the one method in the first part of the histogram and the other in the remaining part?