Efficient way to get/compare distributions of each run?

Dear ROOT experts,

Sorry to trouble you again. Following the discussion in this thread
http://root.cern.ch/phpBB3//viewtopic.php?f=3&t=10611

I have a TTree (Ntuple) with only about 20 branches, but about 10^9 entries. The run number (integer) is among these 20 branches, and I want to compare the distributions of several other variables of each run. One option is that I can project the (variable:runNumber) to a 2D histogram. But since I don’t want to mix different run, so the bins of X needs to be (max runNumber - min runNumber). As mentioned in that thread, there are only about 100 runs, but the runNumber range is quite big (say, 2,000 at this moment). It is not very efficient. And there are two questions here:

  1. is it possible to remove the empty bin (X, run) afterwards?
  2. since the statistics of each run is different, I want to normalize the distributions of each run to 1, how could I do for 2D histogram here?

For 1), a workaround is that I can first get the list of the runNumber and put them in a std::set, as suggested by Charles, then try to add a new branch to store the position of the runNumber in the set and write to a new TTree, after that I can do the 2D projection w.r.t. this new variable. But since I need to loop all the entires and find the position of the run Number of this entry in the run Number set, and write out TTree, the time is not non-negligible. And the file size will also increase… So, is there some map I can use while using TTree::Draw or TTree::Project?

Many thanks in advance for your suggestions.

Cheers, Jibo

Hi Jibo,

I’m not sure I entirely grasp what you’re trying to do with your second variable, but two things that might help:

  1. If you tree->Draw (“run:other”) you can access the two variables using GetV1() and GetV2()

  2. Depending on what you want to do with the other variable, I can imagine using something like
    std::map< int, TH1F* > where you’d want to be careful and check whether you’ve seen this run before or not, but then you can store a histogram for each run number and then do what you want with them. If what you need to do is more complicated, you can write a class that holds everything you need for each run and then use std::map< int, MySpecialClass >.

Good luck,
Charles

Hi, Charles,

Thanks a lot for your reply again.

To be simple, we want to make 3D plots (X: run number, Y: some Variable, Z: Entries), or just a 2D histogram, or 2D scatter plot.

The problem is that we don’t want to mix different run, so for X, the total bins should be (max run number-min run number) while using TTree::Draw to do the loop, and there are many empty bins (95%) here. To see a consecutive 3D distribution, we want to only keep the non-empty bin (run) and remove the empty bins of X. I hope that you got what I want to do now.

Yes, I will try to do this. Currently I can also try to do the map things by hand, i.e., add a new branch to store the position of the run number of some entry in the run number set. Then I can just use
tree->Draw (“position:other”), to get what we want. Yes, I agree this is ugly and not efficient. :slight_smile:

I thought that it is a simple use case, and ROOT may have already supported it. Then it would be more efficient to use the function provided by ROOT. But it seems not…Maybe I should make a feature request?

Thanks, Jibo

[quote=“hejb”]Yes, I will try to do this. Currently I can also try to do the map things by hand, i.e., add a new branch to store the position of the run number of some entry in the run number set. Then I can just use
tree->Draw (“position:other”), to get what we want. Yes, I agree this is ugly and not efficient. :slight_smile:

I thought that it is a simple use case, and ROOT may have already supported it. Then it would be more efficient to use the function provided by ROOT. But it seems not…Maybe I should make a feature request?[/quote]

I think this is actually quite easy to do with basically what I suggested last time

  • Tree::Draw (“run:position:other”)

  • Using GetV1()…GetV3(), fill std::map< int, TH2F* > where the int is the run number and the TH2F is the histogram you want to fill with “position:other”.

This should actually be quite fast as the Draw() command does most of the work for you. The only thing you need to do is make sure that if this is the first time you are filling for a given run, you need to create the TH2F histogram. See cppreference.com/wiki/stl/map/start for some details of how to play with maps.

Cheers,
Charles