Nbr of Bins per Subgroup

Dear ROOT experts,

I have a dataset of app 300M entries, and i am plotting a quantity (pT).
My goal is to split the data equally into subgroups (as shown in the plot),
where each subgroup contains a specific number of entries (around 50k or 100k).
For each subgroup, I aim to identify the x-points that fall within the specific range and compute their average.
I first applied the CDF to my pT histogram and then attempted to divide the data into subgroups (Y-axis) as outlined in the macro.
However, I am encountering an issue when trying to print the number of bins per subgroup.
Specifically, I don’t understand why there are not many bins in each subgroup.

For example, I expect:
First subgroup: bins from 0 to 700
Second subgroup: bins from 701 to 1400 …and so on.

Could you please provide guidance or suggest a solution to this issue?
Split_Y_axis.cc (2.5 KB)

I could be mistaken, but it seems like you’re attempting to divide your histogram into subgroups along the Y-axis. However, your plot represents a 1D histogram, which, by definition, does not have bins along the Y-axis. The binning for 1D histograms occurs along the X-axis.

Hello,
im tryning to equally divide the Y-axis, then look for those points in X-axis which make the boundaries of each subgroup, then assigned a pT (x-axis) for each subgroup by taking the average pt. the goal is to have ~ same statistics in each subgroup. so im trying to look into the bins in the X-axis for each subgroup.

Best ,
Mustapha

Thanks for the additional details.
We can not run your example because we do not have the file merged_all.root. Moreover, looking at your code, it seems there is a mismatch in the open and close curly brackets because the last for loop is outside the function code.

Hello,
due to the fact that the file is heavy,i have put it here :
the file name is merged_all.root
https://cernbox.cern.ch/s/DZB2BRoWZADmV4T

Hi, for some reasons the download does not seem to work for me. It is looping forever…

Sorry, can you please check this link,
https://cernbox.cern.ch/s/6InhPvWvHU1RzZY

Thanks

I managed to download your .root. But I think your macro needs some fixes. It gives me:

Processing Split_Y_axis.cc...
In file included from input_line_8:1:
/Users/couet/Downloads/Split_Y_axis.cc:81:24: error: use of undeclared identifier 'xValuesInYBoundaries'
for (size_t i = 0; i < xValuesInYBoundaries.size(); ++i) {
                       ^
/Users/couet/Downloads/Split_Y_axis.cc:82:50: error: use of undeclared identifier 'boundary_Y_width'
    std::cout << "X values for Y range [" << i * boundary_Y_width << ", " << (i + 1) * boundary_Y_width << "]: ";
                                                 ^
/Users/couet/Downloads/Split_Y_axis.cc:82:88: error: use of undeclared identifier 'boundary_Y_width'
    std::cout << "X values for Y range [" << i * boundary_Y_width << ", " << (i + 1) * boundary_Y_width << "]: ";
                                                                                       ^
/Users/couet/Downloads/Split_Y_axis.cc:83:21: error: use of undeclared identifier 'xValuesInYBoundaries'
    for (double x : xValuesInYBoundaries[i]) {
                    ^

i Have fixed those errors, here it is the correct macro,
Split_Y_axis.cc (2.5 KB)
Thanks

Your macro gives me a blank canavs and this output:

root [0] 
Processing Split_Y_axis.cc...
Cumulative count for bin 1 contributing to Y range [1.5e+07, 1.8e+07]: 1.67772e+07
Cumulative count for bin 2 contributing to Y range [4.8e+07, 5.1e+07]: 4.93653e+07
Cumulative count for bin 3 contributing to Y range [9.6e+07, 9.9e+07]: 9.87305e+07
Cumulative count for bin 4 contributing to Y range [1.62e+08, 1.65e+08]: 1.64873e+08
Cumulative count for bin 5 contributing to Y range [2.46e+08, 2.49e+08]: 2.47793e+08
Cumulative count for bin 6 contributing to Y range [3.45e+08, 3.48e+08]: 3.4749e+08
Cumulative count for bin 7 contributing to Y range [4.62e+08, 4.65e+08]: 4.63964e+08
Cumulative count for bin 8 contributing to Y range [5.97e+08, 6e+08]: 5.97215e+08
Cumulative count for bin 9 contributing to Y range [7.47e+08, 7.5e+08]: 7.47244e+08
Cumulative count for bin 10 contributing to Y range [9.12e+08, 9.15e+08]: 9.14049e+08
X values for Y range [0, 3000000]: X values for Y range [3000000, 6000000]: X values for Y range [6000000, 9000000]: X values for Y range [9000000, 12000000]: X values for Y range [12000000, 15000000]: X values for Y range [15000000, 18000000]: 1.66667 X values for Y range [18000000, 21000000]: X values for Y range [21000000, 24000000]: X values for Y range [24000000, 27000000]: X values for Y range [27000000, 30000000]: X values for Y range [30000000, 33000000]: X values for Y range [33000000, 36000000]: X values for Y range [36000000, 39000000]: X values for Y range [39000000, 42000000]: X values for Y range [42000000, 45000000]: X values for Y range [45000000, 48000000]: X values for Y range [48000000, 51000000]: 5 X values for Y range [51000000, 54000000]: X values for Y range [54000000, 57000000]: X values for Y range [57000000, 60000000]: X values for Y range [60000000, 63000000]: X values for Y range [63000000, 66000000]: X values for Y range [66000000, 69000000]: X values for Y range [69000000, 72000000]: X values for Y range [72000000, 75000000]: X values for Y range [75000000, 78000000]: X values for Y range [78000000, 81000000]: X values for Y range [81000000, 84000000]: X values for Y range [84000000, 87000000]: X values for Y range [87000000, 90000000]: X values for Y range [90000000, 93000000]: X values for Y range [93000000, 96000000]: X values for Y range [96000000, 99000000]: 8.33333 X values for Y range [99000000, 102000000]: X values for Y range [102000000, 105000000]: X values for Y range [105000000, 108000000]: X values for Y range [108000000, 111000000]: X values for Y range [111000000, 114000000]: X values for Y range [114000000, 117000000]: X values for Y range [117000000, 120000000]: X values for Y range [120000000, 123000000]: X values for Y range [123000000, 126000000]: X values for Y range [126000000, 129000000]: X values for Y range [129000000, 132000000]: X values for Y range [132000000, 135000000]: X values for Y range [135000000, 138000000]: X values for Y range [138000000, 141000000]: X values for Y range [141000000, 144000000]: X values for Y range [144000000, 147000000]: X values for Y range [147000000, 150000000]: X values for Y range [150000000, 153000000]: X values for Y range [153000000, 156000000]: X values for Y range [156000000, 159000000]: X values for Y range [159000000, 162000000]: X values for Y range [162000000, 165000000]: 11.6667 X values for Y range [165000000, 168000000]: X values for Y range [168000000, 171000000]: X values for Y range [171000000, 174000000]: X values for Y range [174000000, 177000000]: X values for Y range [177000000, 180000000]: X values for Y range [180000000, 183000000]: X values for Y range [183000000, 186000000]: X values for Y range [186000000, 189000000]: X values for Y range [189000000, 192000000]: X values for Y range [192000000, 195000000]: X values for Y range [195000000, 198000000]: X values for Y range [198000000, 201000000]: X values for Y range [201000000, 204000000]: X values for Y range [204000000, 207000000]: X values for Y range [207000000, 210000000]: X values for Y range [210000000, 213000000]: X values for Y range [213000000, 216000000]: X values for Y range [216000000, 219000000]: X values for Y range [219000000, 222000000]: X values for Y range [222000000, 225000000]: X values for Y range [225000000, 228000000]: X values for Y range [228000000, 231000000]: X values for Y range [231000000, 234000000]: X values for Y range [234000000, 237000000]: X values for Y range [237000000, 240000000]: X values for Y range [240000000, 243000000]: X values for Y range [243000000, 246000000]: X values for Y range [246000000, 249000000]: 15 X values for Y range [249000000, 252000000]: X values for Y range [252000000, 255000000]: X values for Y range [255000000, 258000000]: X values for Y range [258000000, 261000000]: X values for Y range [261000000, 264000000]: X values for Y range [264000000, 267000000]: X values for Y range [267000000, 270000000]: X values for Y range [270000000, 273000000]: X values for Y range [273000000, 276000000]: X values for Y range [276000000, 279000000]: X values for Y range [279000000, 282000000]: X values for Y range [282000000, 285000000]: X values for Y range [285000000, 288000000]: X values for Y range [288000000, 291000000]: X values for Y range [291000000, 294000000]: X values for Y range [294000000, 297000000]: X values for Y range [297000000, 300000000]: X values for Y range [300000000, 303000000]: X values for Y range [303000000, 306000000]: X values for Y range [306000000, 309000000]: X values for Y range [309000000, 312000000]: X values for Y range [312000000, 315000000]: X values for Y range [315000000, 318000000]: X values for Y range [318000000, 321000000]: X values for Y range [321000000, 324000000]: X values for Y range [324000000, 327000000]: X values for Y range [327000000, 330000000]: X values for Y range [330000000, 333000000]: X values for Y range [333000000, 336000000]: X values for Y range [336000000, 339000000]: X values for Y range [339000000, 342000000]: X values for Y range [342000000, 345000000]: X values for Y range [345000000, 348000000]: 18.3333 X values for Y range [348000000, 351000000]: X values for Y range [351000000, 354000000]: X values for Y range [354000000, 357000000]: X values for Y range [357000000, 360000000]: X values for Y range [360000000, 363000000]: X values for Y range [363000000, 366000000]: X values for Y range [366000000, 369000000]: X values for Y range [369000000, 372000000]: X values for Y range [372000000, 375000000]: X values for Y range [375000000, 378000000]: X values for Y range [378000000, 381000000]: X values for Y range [381000000, 384000000]: X values for Y range [384000000, 387000000]: X values for Y range [387000000, 390000000]: X values for Y range [390000000, 393000000]: X values for Y range [393000000, 396000000]: X values for Y range [396000000, 399000000]: X values for Y range [399000000, 402000000]: X values for Y range [402000000, 405000000]: X values for Y range [405000000, 408000000]: X values for Y range [408000000, 411000000]: X values for Y range [411000000, 414000000]: X values for Y range [414000000, 417000000]: X values for Y range [417000000, 420000000]: X values for Y range [420000000, 423000000]: X values for Y range [423000000, 426000000]: X values for Y range [426000000, 429000000]: X values for Y range [429000000, 432000000]: X values for Y range [432000000, 435000000]: X values for Y range [435000000, 438000000]: X values for Y range [438000000, 441000000]: X values for Y range [441000000, 444000000]: X values for Y range [444000000, 447000000]: X values for Y range [447000000, 450000000]: X values for Y range [450000000, 453000000]: X values for Y range [453000000, 456000000]: X values for Y range [456000000, 459000000]: X values for Y range [459000000, 462000000]: X values for Y range [462000000, 465000000]: 21.6667 X values for Y range [465000000, 468000000]: X values for Y range [468000000, 471000000]: X values for Y range [471000000, 474000000]: X values for Y range [474000000, 477000000]: X values for Y range [477000000, 480000000]: X values for Y range [480000000, 483000000]: X values for Y range [483000000, 486000000]: X values for Y range [486000000, 489000000]: X values for Y range [489000000, 492000000]: X values for Y range [492000000, 495000000]: X values for Y range [495000000, 498000000]: X values for Y range [498000000, 501000000]: X values for Y range [501000000, 504000000]: X values for Y range [504000000, 507000000]: X values for Y range [507000000, 510000000]: X values for Y range [510000000, 513000000]: X values for Y range [513000000, 516000000]: X values for Y range [516000000, 519000000]: X values for Y range [519000000, 522000000]: X values for Y range [522000000, 525000000]: X values for Y range [525000000, 528000000]: X values for Y range [528000000, 531000000]: X values for Y range [531000000, 534000000]: X values for Y range [534000000, 537000000]: X values for Y range [537000000, 540000000]: X values for Y range [540000000, 543000000]: X values for Y range [543000000, 546000000]: X values for Y range [546000000, 549000000]: X values for Y range [549000000, 552000000]: X values for Y range [552000000, 555000000]: X values for Y range [555000000, 558000000]: X values for Y range [558000000, 561000000]: X values for Y range [561000000, 564000000]: X values for Y range [564000000, 567000000]: X values for Y range [567000000, 570000000]: X values for Y range [570000000, 573000000]: X values for Y range [573000000, 576000000]: X values for Y range [576000000, 579000000]: X values for Y range [579000000, 582000000]: X values for Y range [582000000, 585000000]: X values for Y range [585000000, 588000000]: X values for Y range [588000000, 591000000]: X values for Y range [591000000, 594000000]: X values for Y range [594000000, 597000000]: X values for Y range [597000000, 600000000]: 25 X values for Y range [600000000, 603000000]: X values for Y range [603000000, 606000000]: X values for Y range [606000000, 609000000]: X values for Y range [609000000, 612000000]: X values for Y range [612000000, 615000000]: X values for Y range [615000000, 618000000]: X values for Y range [618000000, 621000000]: X values for Y range [621000000, 624000000]: X values for Y range [624000000, 627000000]: X values for Y range [627000000, 630000000]: X values for Y range [630000000, 633000000]: X values for Y range [633000000, 636000000]: X values for Y range [636000000, 639000000]: X values for Y range [639000000, 642000000]: X values for Y range [642000000, 645000000]: X values for Y range [645000000, 648000000]: X values for Y range [648000000, 651000000]: X values for Y range [651000000, 654000000]: X values for Y range [654000000, 657000000]: X values for Y range [657000000, 660000000]: X values for Y range [660000000, 663000000]: X values for Y range [663000000, 666000000]: X values for Y range [666000000, 669000000]: X values for Y range [669000000, 672000000]: X values for Y range [672000000, 675000000]: X values for Y range [675000000, 678000000]: X values for Y range [678000000, 681000000]: X values for Y range [681000000, 684000000]: X values for Y range [684000000, 687000000]: X values for Y range [687000000, 690000000]: X values for Y range [690000000, 693000000]: X values for Y range [693000000, 696000000]: X values for Y range [696000000, 699000000]: X values for Y range [699000000, 702000000]: X values for Y range [702000000, 705000000]: X values for Y range [705000000, 708000000]: X values for Y range [708000000, 711000000]: X values for Y range [711000000, 714000000]: X values for Y range [714000000, 717000000]: X values for Y range [717000000, 720000000]: X values for Y range [720000000, 723000000]: X values for Y range [723000000, 726000000]: X values for Y range [726000000, 729000000]: X values for Y range [729000000, 732000000]: X values for Y range [732000000, 735000000]: X values for Y range [735000000, 738000000]: X values for Y range [738000000, 741000000]: X values for Y range [741000000, 744000000]: X values for Y range [744000000, 747000000]: X values for Y range [747000000, 750000000]: 28.3333 X values for Y range [750000000, 753000000]: X values for Y range [753000000, 756000000]: X values for Y range [756000000, 759000000]: X values for Y range [759000000, 762000000]: X values for Y range [762000000, 765000000]: X values for Y range [765000000, 768000000]: X values for Y range [768000000, 771000000]: X values for Y range [771000000, 774000000]: X values for Y range [774000000, 777000000]: X values for Y range [777000000, 780000000]: X values for Y range [780000000, 783000000]: X values for Y range [783000000, 786000000]: X values for Y range [786000000, 789000000]: X values for Y range [789000000, 792000000]: X values for Y range [792000000, 795000000]: X values for Y range [795000000, 798000000]: X values for Y range [798000000, 801000000]: X values for Y range [801000000, 804000000]: X values for Y range [804000000, 807000000]: X values for Y range [807000000, 810000000]: X values for Y range [810000000, 813000000]: X values for Y range [813000000, 816000000]: X values for Y range [816000000, 819000000]: X values for Y range [819000000, 822000000]: X values for Y range [822000000, 825000000]: X values for Y range [825000000, 828000000]: X values for Y range [828000000, 831000000]: X values for Y range [831000000, 834000000]: X values for Y range [834000000, 837000000]: X values for Y range [837000000, 840000000]: X values for Y range [840000000, 843000000]: X values for Y range [843000000, 846000000]: X values for Y range [846000000, 849000000]: X values for Y range [849000000, 852000000]: X values for Y range [852000000, 855000000]: X values for Y range [855000000, 858000000]: X values for Y range [858000000, 861000000]: X values for Y range [861000000, 864000000]: X values for Y range [864000000, 867000000]: X values for Y range [867000000, 870000000]: X values for Y range [870000000, 873000000]: X values for Y range [873000000, 876000000]: X values for Y range [876000000, 879000000]: X values for Y range [879000000, 882000000]: X values for Y range [882000000, 885000000]: X values for Y range [885000000, 888000000]: X values for Y range [888000000, 891000000]: X values for Y range [891000000, 894000000]: X values for Y range [894000000, 897000000]: X values for Y range [897000000, 900000000]: X values for Y range [900000000, 903000000]: X values for Y range [903000000, 906000000]: X values for Y range [906000000, 909000000]: X values for Y range [909000000, 912000000]: X values for Y range [912000000, 915000000]: 31.6667 root [1] 2024-12-02 14:19:22.602 root.exe[6878:365343] +[IMKClient subclass]: chose IMKClient_Modern
2024-12-02 14:19:22.602 root.exe[6878:365343] +[IMKInputSession subclass]: chose IMKInputSession_Modern

Is it what is excepted ?

Look at the cumulative curve. It’s very steep, going from 0 to the maximum between x=0 and ~250, and actually most of the change seems to be between x~30-60, so you may not really find much detail with your current binning. Supposing your code is otherwise ok (**), your bin width is ~3, and in a y-range of 30, with 7 groups you may find around 3 or 4 x-bins per group, maybe less in some groups. Try more (smaller) bins in x than what you have now, and reduce the range in x to 300 (or wherever you already have 100% of the data).

(**) by the way, if your groups are supposed to be every 30,000,000 (according to the lines you drew in the plot above) you need to correct this line too

const int boundary_Y_width = 3000000; 

(that’s 3 million, not 30)

T hank you for your reply. In fact, when I draw the cumulative histogram, I set the number of bins to 300. However, can I set more than that? Since I have 300 million entries, the other thing is that it is not necessary to set const int boundary_Y_width = 3000000; because I only want 100,000 entries per subgroup. So, I will change it to 100,000. To have more bins per subgroup, what do you suggest I set for the number of bins?

Yeah, basically, I tried to print the cumulative content for each subgroup, but I currently have only one bin per subgroup. The number of bins I set for my histogram is 300. And i can set boundary_Y_width =100k to have 100k entries per subgroup. I’m wondering if I could have more than one bin per subgroup by changing the logic of the code.