Compare histograms with different binnings and ranges

I am trying to compare (in the sense of seeing by eye their differences) 2 histograms that have a different binning and different ranges.

My idea was to take the one with the larger range (let’s call it big), rebin it to have the same binning as the one with the smaller range (let’s call it small), then clone the small one and using SetBinContent, copy the values of the rebined to the cloned!

My code to achieve that is the following

#include <TSystem.h>
#include <TH1.h>
#include <TFile.h>
#include "Riostream.h"

#include <fstream>
#include <iostream>
#include <iomanip>// std::setprecision
#include <stdio.h>

using namespace std;
using std::cout;
using std::endl;

TH1F* compress(TH1F* h_big, TH1F* h_small){

	unsigned int nbins_big   = h_big->GetNbinsX();
	unsigned int nbins_small = h_small->GetNbinsX();
	
	TH1F *h_compressed = (TH1F*) h_small->Clone();
	h_compressed->Reset();
	
	if ( nbins_big < nbins_small ){cout << "First histogram should be the bigger. Returning empty histogram." << endl; return (h_compressed);}
	else if ( nbins_big == nbins_small ){cout << "Histograms have already the same binning.Returning the same input histogram." << endl; return (h_big);}
	else{
		if ( nbins_big % nbins_small != 0 ){cout << "Rebin cannot be performed. The two binnings are not dividable. Returning empty histogram." << endl; return (h_compressed);}
		else{
			unsigned int rebin_factor = nbins_big/nbins_small;
			h_big->Rebin(rebin_factor);
			for ( unsigned int i=0; i<nbins_small; ++i ){
				float bin_content = (float) h_big->GetBinContent(i+1);
				h_compressed->SetBinContent(i+1, bin_content);
			}
			h_compressed->Scale(1./rebin_factor);
			return (h_compressed);
		}
	}
}

void caen(char * file_in_1, char * h_in_1, char * file_in_2, char * h_in_2){
	TString file1(file_in_1);
	TString h1(h_in_1);
	TString file2(file_in_2);
	TString h2(h_in_2);
	cout << "Looking for histogram " << h1 << " in file " << file1 << endl;
	cout << "Looking for histogram " << h2 << " in file " << file2 << endl;
	
// (1) Open the root file and get the histogram	
	TFile *fin1 = new TFile(file1,"READ");//Open the root file
	if ( fin1->IsOpen() ) cout << "File " << fin1->GetName() << " opened successfully\n" << endl;//Check if file is opened
	TH1F *h_Big = (TH1F*)fin1->Get(h1);

// (2) Read the ascii file, store its values in a vector and make a histogram with the values
	std::ifstream fin2(file2);
	std::vector<float> floats;
	float value;
	//Check if the float was successfully read from the file
	cout << "Filling vector..." << endl;
	while(fin2 >> value){
		// store it in the vector.
		floats.push_back(value);
	}
	TH1F* h_Small = new TH1F("h_Small","Input Amplitude Distribution; Amplitude (ADC units); Counts",floats.size(),0,floats.size());
	cout << "Filling histogram..." << endl;
	for (unsigned int i = 0; i < floats.size(); ++i){
		h_Small->SetBinContent(i+1, floats[i]);
	}
	
	TH1F *h_Compressed = compress(h_Big, h_Small);
	
	//h_Small->Draw("histo");
	h_Big->Draw("histosame");
	//h_Compressed->Draw("histosame");
	
}

To run it on a root session

.L compress.C++
caen("run106386_out.root", "hamp1", "amp_Pu240_5.csv", "h2")

The problem is that when I run it, it seems like the compressed histogram (red) is shifted compared to the original one (blue) as seen in this plot

Then while I was trying to debug and see what happened I noticed this strange behavior. If I try to plot only the h_Big, while using the compress() function, thus using the code as was pasted above, I get the following output

So in principle, I would expect to have the original histogram printed. Instead it seems that it’s rebined! Indeed if I comment out the line

TH1F *h_Compressed = compress(h_Big, h_Small);

Then I get this

which is the histogram a unrebined, as it supposed to be!

So it seems that when I call the function, the input histogram is rebined!

How can that be possible?

Thank you very much in advance!

Here is a link with the files that might be needed.

https://cernbox.cern.ch/index.php/s/4djYkzm3J5RYAJt
https://cernbox.cern.ch/index.php/s/59zHg4VqfHwadWs

Dear atha.stam,

Not sure to understand: calling compress you pass a pointer to your big histogram, and inside you execute Rebin, so the behaviour seems consistent to me.

For the shift, please check the relative normalization of the horizontal scales. The big histogram starts at about 1/8 of the total scale, and looks consistent in all plots.

G Ganis

It also, appears that you assume that both histograms have their first bin edge in the same place. Which may or may not be true.

Dear ganis,

Thank you very much for your reply!
You are right about the pointers!
I am creating a clone of the h_big so everything is fine!

For the shift…
That is quite puzzling actually…
In principle it shouldn’t be shifted…
But…it is…
So ho deal with that in the sense to make them “start” from the same bin?

You are right that I assume that, but it is a right thing to do for my problem.
The reason that they are not, made me realise that there is an issue with the data probably!

Anyway, how can I “correct” for this inconsistency?

Why is this not sufficient? (not tested)

TH1F* compress(TH1F* h_big, TH1F* h_small) {
   unsigned int rebin_factor = h_big->GetNbinsX() / h_small->GetNbinsX();
   if (rebin_factor < 1) {
      std::cout << "ERROR: More bins in small than big histogram!\n";
      return nullptr; 
   }
   if (h_big->GetNbinsX() % h_small->GetNbinsX() != 0) {
      std::cout << "WARNING: Rebinning factor has been floored.\n";
   }
   TH1F* h_compressed = (TH1F*) h_big->Clone("h_compressed");
   h_compressed->Rebin(rebin_factor);
   return h_compressed;
}

Try (note that the applied “shift by +10 channels” drops the last 10 values from the “file_in_2” as they go into the overflow bin of the “h_Small” histogram):

#include "TFile.h"
#include "TH1.h"
#include "TTree.h"
#include "TCanvas.h"

void caen(const char *file_in_1 = "run106386_out.root",
          const char *h_in_1 = "hamp1",
          const char *file_in_2 = "amp_Pu240_5.csv") {
  TFile *fin1 = TFile::Open(file_in_1);
  if ((!fin1) || fin1->IsZombie()) { delete fin1; return; }
  TH1F *h_Big; fin1->GetObject(h_in_1, h_Big);
  if (!h_Big) { delete fin1; return; }
  /* a temporary TTree */
  TTree *t = new TTree("t", "t"); t->ReadFile(file_in_2, "v/D");
  Int_t n = t->GetEntries();
  if ((n < 1) || (h_Big->GetNbinsX() % n)) { delete t; delete fin1; return; }
  TH1F* h_Small =
    new TH1F("h_Small",
             "Input Amplitude Distribution; Amplitude (ADC units); Counts",
             n, 0, n);
  t->Project("h_Small", "0.5 + Entry$ + 10", "v"); // shift by +10 channels
  delete t; // no longer needed
  TH1F *h_Compressed = new TH1F(*h_Big);
  h_Compressed->SetName("h_Compressed");
  h_Compressed->GetXaxis()->SetLimits(0, n); // make them the same as "h_Small"
  h_Compressed->Scale(1. / (h_Compressed->GetNbinsX() / n)); // first scale ...
  h_Compressed->Rebin(h_Compressed->GetNbinsX() / n); // ... then rebin ...
  h_Compressed->ResetStats(); // ... and finally reset its statistics
  TCanvas *c = new TCanvas("c", "c");
  h_Small->SetLineColor(kBlue);
  h_Small->Draw("histo");
  h_Compressed->SetLineColor(kRed);
  h_Compressed->Draw("histo same");
  c->Modified(); c->Update();
}

Well, instead of the applied “shift by +10 channels”, you could also simply shift “h_Compressed” by -10 channels (no data are “lost” in this case):

t->Project("h_Small", "0.5 + Entry$", "v");
// ...
h_Compressed->GetXaxis()->SetLimits(0 - 10, n - 10); // as "h_Small", shifted by -10

However, I think the real problem is not just a simple “shift” but a change in the “ADC gain” (no data are “lost” in this case):

t->Project("h_Small", "0.5 + Entry$", "v");
// ...
h_Compressed->GetXaxis()->SetLimits(0, n / 1.5); // "ADC gain" correction

Thank you very much for your help!
It indeed seems that it’s a problem with a gain shift!

Note that you do not need to create “h_Compressed” with the binning similar to “h_Small” at all.
You can simply use “h_Big” as it is (i.e. draw it without “scaling” and “rebinning”):

h_Big->GetXaxis()->SetLimits(0, n / 1.5); // "ADC gain" correction
h_Big->ResetStats();

And, of course, you can also do it the other way around … instead of correcting “h_Big”, correct “h_Small” (after it’s been “filled”):

h_Small->GetXaxis()->SetLimits(0, h_Big->GetXaxis()->GetXmax() * 1.5); // "ADC gain" correction
h_Small->ResetStats();

That definitely sounds more straight forward!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.