Are there plans to allow the filling of a THnSparse
with RDataFrame
, a la Histo1D
?
ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided
Are there plans to allow the filling of a THnSparse
with RDataFrame
, a la Histo1D
?
ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided
Hi, does the Fill action fit the bill? (https://root.cern/doc/master/classROOT_1_1RDF_1_1RInterface.html#a0cac4d08297c23d16de81ff25545440a)
Maybe! It looks like it’s not yet implemented for THnSparse
, or perhaps I just don’t understand how to call it; I get a NotImplementedError
in the following script:
from array import array
import ROOT
df = ROOT.RDataFrame(100).Define('br1', 'rdfentry_').Define('br2', 'pow(rdfentry_, 2)')
nbins = array('i', (10, 10))
xmin = array('d', (0, 0))
xmax = array('d', (100, 10000))
h = ROOT.THnSparseD('h', 'h;br1;br2', 2, nbins, xmin, xmax)
collist = ROOT.std.vector('std::string')()
for c in df.GetColumnNames():
collist.push_back(c)
h2 = df.Fill(h, collist)
returns:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-18-bc42dad28816> in <module>()
----> 1 h2 = df.Fill(h, collist)
NotImplementedError: ROOT::RDF::RResultPtr<THnSparseT<TArrayD> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Fill<THnSparseT<TArrayD> >(THnSparseT<TArrayD>&& model, const vector<string>& bl) =>
could not convert argument 1 (this method can not (yet) be called)
Moreover, it’s not clear from that documentation how I would Fill
with weights.
Hi,
uhm not sure why it doesn’t work, I’ll have to take a look.
It’s a generic C++ template method to which you can pass any object with a Fill
method, which will be called with the values of the specified columns as arguments. As such it is pretty generic and you can also use it to fill histograms with weights if the signature of the Fill
method allows it.
I might be that it doesn’t work with THnSparse
because it doesn’t implement an Add
method, which is probably also required now that I think about it.
In the meanwhile, one way I can think of to get what you want is to use custom actions – see the tutorial here and the docs here, although to make this work in python you would need to define a C++ function that adds the custom action to a dataframe and then call it from python, which is a bit tedious.
Cheers,
Enrico
Could it be a problem that THnSparse
requires an array as the argument?
Indeed, I cannot get this to work in pyROOT, even with the tutorial as-written. Details below, but in summary: I declare the class as in the tutorial but cannot find a way to call the constructor in pyROOT in a way the class recognizes–it doesn’t seem to like any of the list-building classes I’ve tried.
Declare the class (copy/paste from the tutorial):
ROOT.gROOT.ProcessLine('''
template <typename T, unsigned int NDIM>
class THnHelper : public ROOT::Detail::RDF::RActionImpl<THnHelper<T, NDIM>> {
public:
/// This is a handy, expressive shortcut.
using THn_t = THnT<T>;
/// This type is a requirement for every helper.
using Result_t = THn_t;
private:
std::vector<std::shared_ptr<THn_t>> fHistos; // one per data processing slot
public:
/// This constructor takes all the parameters necessary to build the THnTs. In addition, it requires the names of
/// the columns which will be used.
THnHelper(std::string_view name, std::string_view title, std::array<int, NDIM> nbins, std::array<double, NDIM> xmins,
std::array<double, NDIM> xmax)
{
const auto nSlots = ROOT::IsImplicitMTEnabled() ? ROOT::GetImplicitMTPoolSize() : 1;
for (auto i : ROOT::TSeqU(nSlots)) {
fHistos.emplace_back(std::make_shared<THn_t>(std::string(name).c_str(), std::string(title).c_str(),
NDIM, nbins.data(), xmins.data(), xmax.data()));
(void)i;
}
}
THnHelper(THnHelper &&) = default;
THnHelper(const THnHelper &) = delete;
std::shared_ptr<THn_t> GetResultPtr() const { return fHistos[0]; }
void Initialize() {}
void InitTask(TTreeReader *, unsigned int) {}
/// This is a method executed at every entry
template <typename... ColumnTypes>
void Exec(unsigned int slot, ColumnTypes... values)
{
// Since THnT<T>::Fill expects a double*, we build it passing through a std::array.
std::array<double, sizeof...(ColumnTypes)> valuesArr{static_cast<double>(values)...};
fHistos[slot]->Fill(valuesArr.data());
}
/// This method is called at the end of the event loop. It is used to merge all the internal THnTs which
/// were used in each of the data processing slots.
void Finalize()
{
auto &res = fHistos[0];
for (auto slot : ROOT::TSeqU(1, fHistos.size())) {
res->Add(fHistos[slot].get());
}
}
std::string GetActionName(){
return "THnHelper";
}
};
'''
)
I cannot actually construct it, however. I have tried using array.array
, ROOT.std.vector
, and ROOT.std.list
. All give the same result:
nbins = array('i', (10, 10))
xmin = array('d', (0, 0))
xmax = array('d', (5, 5))
h = ROOT.THnHelper(float, 2)('h', 'h', nbins, xmin, xmax)
nbins = ROOT.std.vector('int')()
nbins.push_back(10)
nbins.push_back(10)
xmin = ROOT.std.vector('double')()
xmin.push_back(0)
xmin.push_back(0)
xmax = ROOT.std.vector('double')()
xmax.push_back(5)
xmax.push_back(5)
h = ROOT.THnHelper(float, 2)('h', 'h', nbins, xmin, xmax)
nbins = ROOT.std.list('int')()
nbins.push_back(10)
nbins.push_back(10)
xmin = ROOT.std.list('double')()
xmin.push_back(0)
xmin.push_back(0)
xmax = ROOT.std.list('double')()
xmax.push_back(5)
xmax.push_back(5)
h = ROOT.THnHelper(float, 2)('h', 'h', nbins, xmin, xmax)
all give:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-29c927b1bf48> in <module>()
2 xmin = array('d', (0, 0))
3 xmax = array('d', (5, 5))
----> 4 h = ROOT.THnHelper(float, 2)('h', 'h', nbins, xmin, xmax)
TypeError: none of the 3 overloaded methods succeeded. Full details:
THnHelper<float,2>::THnHelper<float,2>(basic_string_view<char,char_traits<char> > name, basic_string_view<char,char_traits<char> > title, array<int,2> nbins, array<double,2> xmins, array<double,2> xmax) =>
could not convert argument 3
THnHelper<float,2>::THnHelper<float,2>(THnHelper<float,2>&&) =>
takes at most 1 arguments (5 given)
THnHelper<float,2>::THnHelper<float,2>(const THnHelper<float,2>&) =>
takes at most 1 arguments (5 given)
Hi,
that’s interesting! Support for template methods is indeed suboptimal in current PyROOT. Things should get better when experimental PyROOT becomes the default, but still these RDF methods might require pythonizations to work properly. So I’m not sure there is a proper solution at the moment.
My actual suggestion was to do all of the RDF stuff on the C++ side, and only get the result in PyROOT:
# PyROOT
df = ROOT.RDataFrame(...)
hnsparse = ROOT.CppFunctionThatCallsFill(df)
where CppFunctionThatCallsFill
is something like:
// C++
ROOT::RDF::RResultPtr<THnSparse> CppFunctionThatCallsFill(ROOT::RDF::RNode df)
{
THnSparse ...;
return df.Book(...);
}
But of course this is…awkward.
Okay, I’ve found a solution. Thank you for your advice and for pointing me to the tutorial. My code follows below for anyone interested down the line.
def declare():
'declares class SparseHelper to enable RDataFrame to fill a THnSparse'
'it is based on $ROOTSYS/tutorials/dataframe/df018_customActions.C'
import ROOT
ROOT.gROOT.ProcessLine(r'''
// This is a custom action which respects a well defined interface. It supports parallelism,
// in the sense that it behaves correctly if implicit multi threading is enabled.
// We template it on:
// - The type of the internal THnSparseT(s)
// - The dimension of the internal THnSparseT(s)
// Note the plural: in presence of a MT execution, internally more than a single THnSparseT is created.
// Note also that when booking, the last column will be understood as the weight
template <typename T, unsigned int NDIM>
class SparseHelper : public ROOT::Detail::RDF::RActionImpl<SparseHelper<T, NDIM>> {
public:
/// This is a handy, expressive shortcut.
using THn_t = THnSparseT<T>;
/// This type is a requirement for every helper.
using Result_t = THn_t;
private:
std::vector<std::shared_ptr<THn_t>> fHistos; // one per data processing slot
public:
/// This constructor takes all the parameters necessary to build the THnSparseTs. In addition, it requires the names of
/// the columns which will be used.
SparseHelper(std::string_view name, std::string_view title, std::array<int, NDIM> nbins, std::array<double, NDIM> xmins,
std::array<double, NDIM> xmax)
{
const auto nSlots = ROOT::IsImplicitMTEnabled() ? ROOT::GetImplicitMTPoolSize() : 1;
for (auto i : ROOT::TSeqU(nSlots)) {
fHistos.emplace_back(std::make_shared<THn_t>(std::string(name).c_str(), std::string(title).c_str(),
NDIM, nbins.data(), xmins.data(), xmax.data()));
(void)i;
}
}
SparseHelper(SparseHelper &&) = default;
SparseHelper(const SparseHelper &) = delete;
std::shared_ptr<THn_t> GetResultPtr() const { return fHistos[0]; }
void Initialize() {}
void InitTask(TTreeReader *, unsigned int) {}
/// This is a method executed at every entry
template <typename... ColumnTypes>
void Exec(unsigned int slot, ColumnTypes... values)
{
// Since THnSparseT<T>::Fill expects a double*, we build it passing through a std::array.
std::array<double, sizeof...(ColumnTypes)> _valuesArr{static_cast<double>(values)...};
// We understand the last item in the list as a weight
std::array<double, sizeof(_valuesArr) - 1> valuesArr;
for(int i=0; i<sizeof(_valuesArr) - 1; i++){
valuesArr[i] = _valuesArr[i];
}
auto w = _valuesArr.back();
fHistos[slot]->Fill(valuesArr.data(), w);
}
/// This method is called at the end of the event loop. It is used to merge all the internal THnSparseTs which
/// were used in each of the data processing slots.
void Finalize()
{
auto &res = fHistos[0];
for (auto slot : ROOT::TSeqU(1, fHistos.size())) {
res->Add(fHistos[slot].get());
}
}
std::string GetActionName(){
return "SparseHelper";
}
};
''')
def fill(df, hnm, htit, brlist, weightbr):
'create THnSparse(hnm, htit, ...) wrapped in RResultPtr'
import ROOT
br_and_weight_list = brlist + [weightbr]
ROOT.gROOT.ProcessLine('''
ROOT::RDF::RResultPtr<THnSparseT<TArrayD> > CppFunctionThatCallsFillFor_{hnm}(ROOT::RDF::{dfclassnm} df)
{{
// Our Helper type: templated on the internal THnSparseT type, the size, the types of the columns
// we'll use to fill.
using Helper_t = SparseHelper<TArrayD, {ndims}>;
Helper_t helper{{"{hnm}",
"{htit}",
{{{nbins}}},
{{{xmins}}},
{{{xmaxs}}}}};
// We book the action: it will be treated during the event loop.
auto myTHnSparseT = df.Book<{dattyps}>(std::move(helper), {{{br_and_weight_nms}}});
return myTHnSparseT;
}}
'''.format(
ndims=len(brlist),
dfclassnm=df.__class__.__name__,
hnm=hnm,
htit=htit,
nbins=','.join(str(x.nBins) for x in brlist),
xmins=','.join(str(x.loBin) for x in brlist),
xmaxs=','.join(str(x.hiBin) for x in brlist),
dattyps=','.join(x.datatype for x in br_and_weight_list),
br_and_weight_nms=','.join('"{}"'.format(x.branch) for x in br_and_weight_list))
)
return getattr(ROOT, 'CppFunctionThatCallsFillFor_' + hnm)(df)
def test():
import ROOT
ROOT.gROOT.SetBatch(True)
ROOT.ROOT.EnableImplicitMT()
declare()
df = ROOT.RDataFrame(10000).Define('x0', 'gRandom->Uniform(-5, 5)').Define('x1', '(float)gRandom->Uniform(0, 10)').Define('x2', '(int)gRandom->Uniform(-100, 100)').Define('weight', '(double) 1.')
class mybranch():
def __init__(self, branch, nBins=0, loBin=0, hiBin=0, datatype=None):
self.branch = branch
self.nBins = nBins
self.loBin = loBin
self.hiBin = hiBin
self.datatype = datatype
brlist = [
mybranch('x0', 10, -5, 5, datatype='double'),
mybranch('x1', 10, 0, 10, datatype='float'),
mybranch('x2', 10, -100, 100, datatype='int'),
]
wbr = mybranch('weight', datatype='double')
myhist1 = fill(df, 'myhist1', 'myhist1', brlist[:1], wbr)
myhist2 = fill(df, 'myhist2', 'myhist2', brlist[:2], wbr)
myhist3 = fill(df, 'myhist3', 'myhist3', brlist[:3], wbr)
c = ROOT.TCanvas()
hs = ROOT.THStack()
hs.Add(myhist1.Projection(0))
hs.Add(myhist2.Projection(1))
hs.Add(myhist3.Projection(2))
hs.Draw('nostack plc pmc')
c.BuildLegend()
c.SaveAs('test.png')
if __name__ == '__main__':
test()
Wow, well done! Hopefully in the future we’ll have a better answer…
I’m afraid I still have a problem. How can I get OnPartialResultSlot
to work? Right now my action doesn’t ever get called, even if I add
/// This is a method that allows updates on the partial result
std::shared_ptr<THn_t> &PartialUpdate(unsigned int slot){
return fHistos[slot];
}
to the SparseHelper
declaration…
Hi,
Can you produce a self-contained reproducer that I can play with? I’m not sure I understand the situation.
Cheers,
Enrico
Sure. See below. It contains the same code above, plus a PartialUpdate
class member, and adds a OnPartialResultSlot
action to be called every event that prints “I HAVE BEEN CALLED” using cout
. When I run this script, I see no such output, even though I should see “I HAVE BEEN CALLED” printed 10,000 times…
def declare():
'declares class SparseHelper to enable RDataFrame to fill a THnSparse'
'it is based on $ROOTSYS/tutorials/dataframe/df018_customActions.C'
import ROOT
ROOT.gROOT.ProcessLine(r'''
// This is a custom action which respects a well defined interface. It supports parallelism,
// in the sense that it behaves correctly if implicit multi threading is enabled.
// We template it on:
// - The type of the internal THnSparseT(s)
// - The dimension of the internal THnSparseT(s)
// Note the plural: in presence of a MT execution, internally more than a single THnSparseT is created.
// Note also that when booking, the last column will be understood as the weight
template <typename T, unsigned int NDIM>
class SparseHelper : public ROOT::Detail::RDF::RActionImpl<SparseHelper<T, NDIM>> {
public:
/// This is a handy, expressive shortcut.
using THn_t = THnSparseT<T>;
/// This type is a requirement for every helper.
using Result_t = THn_t;
private:
std::vector<std::shared_ptr<THn_t>> fHistos; // one per data processing slot
public:
/// This constructor takes all the parameters necessary to build the THnSparseTs. In addition, it requires the names of
/// the columns which will be used.
SparseHelper(std::string_view name, std::string_view title, std::array<int, NDIM> nbins, std::array<double, NDIM> xmins,
std::array<double, NDIM> xmax)
{
const auto nSlots = ROOT::IsImplicitMTEnabled() ? ROOT::GetImplicitMTPoolSize() : 1;
for (auto i : ROOT::TSeqU(nSlots)) {
fHistos.emplace_back(std::make_shared<THn_t>(std::string(name).c_str(), std::string(title).c_str(),
NDIM, nbins.data(), xmins.data(), xmax.data()));
(void)i;
}
}
SparseHelper(SparseHelper &&) = default;
SparseHelper(const SparseHelper &) = delete;
std::shared_ptr<THn_t> GetResultPtr() const { return fHistos[0]; }
void Initialize() {}
void InitTask(TTreeReader *, unsigned int) {}
/// This is a method executed at every entry
template <typename... ColumnTypes>
void Exec(unsigned int slot, ColumnTypes... values)
{
// Since THnSparseT<T>::Fill expects a double*, we build it passing through a std::array.
std::array<double, sizeof...(ColumnTypes)> _valuesArr{static_cast<double>(values)...};
// We understand the last item in the list as a weight
std::array<double, sizeof(_valuesArr) - 1> valuesArr;
for(int i=0; i<sizeof(_valuesArr) - 1; i++){
valuesArr[i] = _valuesArr[i];
}
auto w = _valuesArr.back();
fHistos[slot]->Fill(valuesArr.data(), w);
}
/// This is a method that allows updates on the partial result
std::shared_ptr<THn_t> &PartialUpdate(unsigned int slot){
return fHistos[slot];
}
/// This method is called at the end of the event loop. It is used to merge all the internal THnSparseTs which
/// were used in each of the data processing slots.
void Finalize()
{
auto &res = fHistos[0];
for (auto slot : ROOT::TSeqU(1, fHistos.size())) {
res->Add(fHistos[slot].get());
}
}
std::string GetActionName(){
return "SparseHelper";
}
};
''')
def fill(df, hnm, htit, brlist, weightbr):
'create THnSparse(hnm, htit, ...) wrapped in RResultPtr'
import ROOT
br_and_weight_list = brlist + [weightbr]
ROOT.gROOT.ProcessLine('''
ROOT::RDF::RResultPtr<THnSparseT<TArrayD> > CppFunctionThatCallsFillFor_{hnm}(ROOT::RDF::{dfclassnm} df)
{{
// Our Helper type: templated on the internal THnSparseT type, the size, the types of the columns
// we'll use to fill.
using Helper_t = SparseHelper<TArrayD, {ndims}>;
Helper_t helper{{"{hnm}",
"{htit}",
{{{nbins}}},
{{{xmins}}},
{{{xmaxs}}}}};
// We book the action: it will be treated during the event loop.
auto myTHnSparseT = df.Book<{dattyps}>(std::move(helper), {{{br_and_weight_nms}}});
return myTHnSparseT;
}}
'''.format(
ndims=len(brlist),
dfclassnm=df.__class__.__name__,
hnm=hnm,
htit=htit,
nbins=','.join(str(x.nBins) for x in brlist),
xmins=','.join(str(x.loBin) for x in brlist),
xmaxs=','.join(str(x.hiBin) for x in brlist),
dattyps=','.join(x.datatype for x in br_and_weight_list),
br_and_weight_nms=','.join('"{}"'.format(x.branch) for x in br_and_weight_list))
)
return getattr(ROOT, 'CppFunctionThatCallsFillFor_' + hnm)(df)
def test():
import ROOT
ROOT.gROOT.SetBatch(True)
ROOT.ROOT.EnableImplicitMT()
declare()
df = ROOT.RDataFrame(10000).Define('x0', 'gRandom->Uniform(-5, 5)').Define('x1', '(float)gRandom->Uniform(0, 10)').Define('x2', '(int)gRandom->Uniform(-100, 100)').Define('weight', '(double) 1.')
class mybranch():
def __init__(self, branch, nBins=0, loBin=0, hiBin=0, datatype=None):
self.branch = branch
self.nBins = nBins
self.loBin = loBin
self.hiBin = hiBin
self.datatype = datatype
brlist = [
mybranch('x0', 10, -5, 5, datatype='double'),
mybranch('x1', 10, 0, 10, datatype='float'),
mybranch('x2', 10, -100, 100, datatype='int'),
]
wbr = mybranch('weight', datatype='double')
myhist1 = fill(df, 'myhist1', 'myhist1', brlist[:1], wbr)
myhist2 = fill(df, 'myhist2', 'myhist2', brlist[:2], wbr)
myhist3 = fill(df, 'myhist3', 'myhist3', brlist[:3], wbr)
global tokenh # so ROOT::TPython can get it
tokenh = myhist1
ROOT.gInterpreter.ProcessLine(
'std::mutex bar_mutex_{unm};'
'ROOT::RDF::RResultPtr<THnSparseT<TArrayD> > * cpph_{unm} = (ROOT::RDF::RResultPtr<THnSparseT<TArrayD> > * )TPython::Eval("tokenh");'
'cpph_{unm}->OnPartialResultSlot({freq}, [&bar_mutex_{unm}](unsigned int, THnSparseT<TArrayD> &){{'
'std::lock_guard<std::mutex> l(bar_mutex_{unm});'
'cout<<"I HAVE BEEN CALLED"<<endl;'
'}});'
''.format(unm=tokenh.GetName(), freq=1) # ensure unique definitions
)
myhist1.Sumw2()
c = ROOT.TCanvas()
hs = ROOT.THStack()
hs.Add(myhist1.Projection(0))
hs.Add(myhist2.Projection(1))
hs.Add(myhist3.Projection(2))
hs.Draw('nostack plc pmc')
c.BuildLegend()
c.SaveAs('test.png')
if __name__ == '__main__':
test()
Ah, this one’s tricky: .format(unm=tokenh.GetName(), freq=1)
triggers the event loop before the string is interpreted by ProcessLine
changing tokenh.GetName()
to e.g. "foo"
gets you the printouts.
Very tricky. Maybe we need an opt-in non-lazy mode…?
Thank you much! I did not realize GetName
triggered the event loop. Thanks again for all your help on this.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.