Root RDF Weighted histograms non sensible

Dear experts,

I am seeing weird behavior when running my RDF code while created weighted histograms. In our analysis we create weighted cutflows by basically creating a histogram after each Filter we apply. Which looks a bit like this:

df =df.Define(“cutbin”, 0.5)
df = df.Filter(“x>10”)
cutflowhists = []
cutflowhists.append(  df.Hist1D((f"{weighted_cutflow1", weight_title, 1, 0.0, 1.0), "cutbin", "weight"))
df = df.Filter(“y>10”)
cutflowhists.append(  df.Hist1D((f"{weighted_cutflow2", weight_title, 1, 0.0, 1.0), "cutbin", "weight"))
...

Where weight is the final weight to be used. This includes basically all the event per event weights as well as some metadata about cross-sections/luminosity and others.

In our code we run over multiple RDatasetSpecs, each having their own dataframe. Each datasetspec contains multiple (sub) samples. Think of it like a DY process (DatasetSpec) containing a 2016 MC sample, a 2017 MC sample etc.

We define rdfs for all of the datasets specs and run them all together via the run graphs method.

In the end we create the Outflow by looping over all the histograms in the cutflowhists list and get the Integral() of the Histogram. What we noticed is that always for the first few cuts/hists the Integral is basically nonsense. I’ll paste an example here:

Cut                                                     Input       Pass         Eff      CumEff
__startcut__                                       10794198204690424573386219210145792.0000 10794198204690424573386219210145792.0000    100.000%    100.000%
((bjet1.get_HadronConeExclTruthLabelID()==0 && bjet2.get_HadronConeExclTruthLabelID()==0)) 10794198204690424573386219210145792.0000 1090736866060343443482038239232.0000      0.010%      0.010%
n_analysis_leptons == 0                            1090736866060343443482038239232.0000 1090736866060343443482038239232.0000    100.000%      0.010%
at least 2 central jets                            1090736866060343443482038239232.0000 1090736866060343443482038239232.0000    100.000%      0.010%
at least 1 b-jet                                   1090736866060343443482038239232.0000  5041.5377      0.000%      0.000%
Trigger dependent cuts                              5041.5377  4083.4054     80.995%      0.000%
mmc mass > 60 GeV                                   4083.4054  3972.9277     97.294%      0.000%

As you can see the first few cuts are basically a very big number. and then suddenly it starts to make sense. I have tried already many different things but I can’t get to fix it. A few things I noticed.

  1. It’s rather random, or at least to me, wether this happens for a dataset. There are datasetspecs that show a complete normal cutflow, while others are weird.
  2. For some of them the Cutflow is normal if I run only 1 datasetSpec at once while for others even if I run this datasetspec on its own the problem persists.
  3. I have tried running it in root version 6.32.08 as well as 6.36.02 but both seem to have the problem.
  4. I tried using a weight of one and that works normally.
  5. The entries from the Histogram via GetEntries() are compatible with the entries from the CutFlow object RDF automatically spits out. (Or at least the few ones I looked at)

I suspect there is something weird going on with the memory of the weight column but I have no idea what it is. Any help here is highly appreciated.

Thank you so much for your help.

Cheers, Jordy

root setup info:
I use views via cvmfs: lsetup “views LCG_106b x86_64-el9-gcc11-opt”
or lsetup “views LCG_108 x86_64-el9-gcc13-opt”

Hi @Jordy_Degens,

thanks a lot for reporting this. In order to understand your issue fully I would need a proper reproducer, could we arrange that? We can also communicate in private on MatterMost for example.

Cheers,

Marta

Just a quick update:

We did some more tests, and we think we might have traced this back to an uninitialized member variable in a Tau class we created. There is an efficiency scale factor member variable, which enters into the definition of the final event weight. This member was not initialized when using the default Tau constructor (which should not get called for most events , but may get called in our more inclusive event set). My understanding is that an uninitialized float class member should get a value of 0 at runtime, but it is not absolutely guaranteed and may result in indeterminate behavior like we see here. So we fixed this in our class definitions, and for the moment, this issue goes away. But since the problem also seemed to depend on certain runtime conditions, we’re still stress testing to see if this really fixes the issue. Will report back if it reemerges

1 Like