Efficient way to produce a lot of Histo1D having an RVec<double> weight column

RENATO_QUAGLIANI · November 4, 2020, 6:57pm

Dear experts
This is a naive question on performance optimization when dealing with a vector column representing saying 100 different weight values and plotting the a scalar variable for each of them.

In practice,
Is the df.Histo1D( model, scalarVar, vecColumn) working producing a vector of th1d object, where each histo has the same entries of the other and the only change is the [i] column used for the weight?

In case this is not expected to work, is there any recommended way to achieve this without having to call 100 defines for weight_i and histo1d in chain?

Renato

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

eguiraud · November 5, 2020, 9:53am

Hi Renato,

You can easily verify that’s not what happens

~ root -l                                                                (cern-root) 
root [0] auto h = ROOT::RDataFrame(1).Define("x", "42").Define("v", "ROOT::RVec<float>{1.,2.,3.}").Histo1D("x", "v");
root [1] h
(ROOT::RDF::RResultPtr<TH1D> &) @0x7fe2b570b008
root [2] *h
RDataFrame::Run: event loop was interrupted
Error in <TRint::HandleTermInput()>: std::runtime_error caught: Cannot fill object if the type of the first column is a scalar and the one of the second a container.

Some more discussion on this topic is available in this JIRA ticket.

100 Define + Histo1D should not be too slow, especially if you use lambdas for the Define and specify a template parameter as in Histo1D<float>. If it is too slow, consider filling a single 2D histogram instead, and later slice it. If that’s not possible in your case, but you need better performance, consider coding a custom action that does what you want (tutorial here).

Cheers,
Enrico

RENATO_QUAGLIANI · November 5, 2020, 12:34pm

I read the JIRA task.
I think in HEP the typical use case [which is the one i look for as well] is the
Histo1D( ,"scalar","vector_weight"), for example it is used if have bootstrapped the simulation and the corrections to simulated samples, and you want to find the “overall” bootstrapped distribution of scalar given the weight, or you use histograms to do differential efficiencies estimation.

In what you suggest about doing a 2D slicing, saying we bootstrapped the correction to simulation in N ways, i don’t find either a nice solution on how to pack the vector column into something broadcastable to the Histo2D.
The only solution i can think atm is to Take<> scalar and RVec<double> weights, and loop again to fill histograms.

I don’t see how this use case can be sent to a Histo2D, without having to heavily manipulate the scalar orr the RVec column…
Am i missing something?

eguiraud · November 14, 2020, 2:16pm

I am missing the context here, I don’t understand what the comment refers to.

Also in your Histo1D( ,"scalar","vector_weight") I suspect you want a vector of histograms as output, not a single histogram filled with each of the elements…?

RENATO_QUAGLIANI · November 14, 2020, 9:21pm

I posted in the wrong thread, i was referring to another. Apologize. Yes the use case i have is that i have an ntuple storing at each entry a vector of weights, and i ideally would like to get a vector of histograms for a given scalar variable and a vector of weights

system · November 28, 2020, 9:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.