RResultPtr::OnPartialResult function in pyROOT

mwilkins · June 3, 2019, 5:48pm

How can I declare a function in pyROOT in such a way that RResultPtr::OnPartialResult accepts it as the callback argument?

Details:

I would like to update a python progressbar while RDataFrame loops events. Something like:

import ROOT
import progressbar
import threading

f = ROOT.TFile('...')
t = f.Get('...')
df = ROOT.RDataFrame(t)
h = df.Histo1D(('name', 'title', 100, 0, 100), 'histogram')

fbarlock = threading.Lock()
iprog = 0
nEntries = f.GetEntries()
freq = nEntries / 100 + 1
fbar = progressbar.ProgressBar(maxval=nEntries, widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])

def update_fbar(threadnum, thing):
    fbarlock.acquire()
    iprog += freq
    fbar.update(iprog)
    fbarlock.release()

print 'looping over {n} entries from {t}...'.format(n=nEntries, t=t.GetName())

h.OnPartialResultSlot(freq, update_fbar)

can = ROOT.TCanvas()
h.Draw()
fbar.finish()

but when I do I get:

TypeError: ROOT::RDF::RResultPtr<TH1D>& ROOT::RDF::RResultPtr<TH1D>::OnPartialResultSlot(ULong64_t everyNEvents, function<void(unsigned int,TH1D&)> callback) =>
    could not convert argument 2

ROOT Version: 6.19/01
Platform: macOS
Compiler: Not Provided

eguiraud · June 3, 2019, 6:11pm

Hi,
there’s currently no way to do that from PyROOT: the only OnPartialResult implementation accepts a C++ callable and there are no pythonizations that convert the python callable into a C++ callable.

You can of course make it work by embedding C++ code in your python code and jit it with ROOT.gInterpreter.ProcessLine. I realize this is awkward at best.

@etejedor or @swunsch might be able to comment about better ways to solve this problem that might be coming in the future.

Cheers,
Enrico

mwilkins · June 3, 2019, 6:43pm

Thank you for your reply. The main limitation, in this case, of ROOT.gInterpreter.ProcessLine is that I want my python variables to update during the loop. Is there any way to store variables such that both my python script and gInterpreter can access them? E.g., in the following, I want gInterpreter to recognize the symbol t:

f = ROOT.TFile('...')
t = f.Get('...')
ROOT.gInterpreter.ProcessLine('cout<<t.GetEntries()<<endl;')

eguiraud · June 3, 2019, 8:25pm

Three ways that I know of:

C++ -> Python

ROOT.gInterpreter.ProcessLine('auto f = TFile::Open("..."); TTree *t; f.GetObject("....", t);') 
ROOT.t.GetEntries() # works
ROOT.gInterpreter.ProcessLine('cout << t.GetEntries() << endl;') # also works

Python -> C++

>>> x = 42
>>> ROOT.gInterpreter.ProcessLine('cout << int(TPython::Eval("x")) << endl;')
42

(see TPython::Eval).

For specific ROOT objects such as TFiles or histograms, you can retrieve them from gDirectory (or anyway from their TDirectory/TFile):

>>> h = ROOT.TH1D("myh", "myh", 100, 0, 1)
>>> ROOT.gInterpreter.ProcessLine('gDirectory->Get("myh")->GetName()')
(const char *) "myh"

You can also play with addresses and reinterpret_casts (not suggesting you should, but you can):

>>> h = ROOT.TH1D("myh","myh",100,0,1)
>>> ROOT.gInterpreter.Declare("Long64_t AddrAsLong(void *p) { return reinterpret_cast<Long64_t>(h); }")
>>> ROOT.gInterpreter.ProcessLine("cout << reinterpret_cast<TH1D*>({addr})->GetName() << endl;".format(addr=ROOT.AddrAsLong(h)))
"myh"

As a side note, remember that all python code is executed under the GIL and therefore sequentially even in multit-thread programs.

And: passing python callables to C++, or C++ callables to C++ from PyROOT is getting easier thanks to @swunsch, so maybe in the future we’ll have a simpler way to do it.

mwilkins · June 3, 2019, 9:14pm

Thank you for your help. I was able to get the result I wanted with:

nEntries = t.GetEntries()
print 'looping over {n} entries from {t}...'.format(n=nEntries, t=t.GetName())
fbar = progressbar.ProgressBar(maxval=nEntries, widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])

freq = int(nEntries / 100 + 1)
ROOT.gInterpreter.ProcessLine(
    'std::mutex bar_mutex{file_num};'
    'int fbarprogress{file_num} = 0;'
    'ROOT::RDF::RResultPtr<TH1D> * cpph{file_num} = (ROOT::RDF::RResultPtr<TH1D> * )TPython::Eval("h");'
    'cpph{file_num}->OnPartialResultSlot(int(TPython::Eval("freq")), [&bar_mutex{file_num}, &fbarprogress{file_num}](unsigned int, TH1D &){{'
    'std::lock_guard<std::mutex> l(bar_mutex{file_num});'
    'fbarprogress{file_num} += int(TPython::Eval("freq"));'
    'char * fbarupdate_expression = Form("fbar.update(%d)", fbarprogress{file_num});'
    'TPython::Eval(fbarupdate_expression);'
    '}});'
    ''.format(file_num=file_num)  # ensure unique definitions in the case of multiple files
)

eguiraud · June 3, 2019, 9:59pm

Note that depending on your use case, that progress bar might have a visible runtime cost: you are calling python 3 times per entry per thread, and those calls all hit the GIL. Or not: if 99.9% of the time is spent elsewhere, that callback might not impact runtimes at all.

In case you measure, please let us know whether that’s the case – I’m curious.

Cheers,
Enrico

mwilkins · June 4, 2019, 4:46pm

Done. I timed it by comparing the runtime when I call the python value vs. including it as part of the string.
Calling freq using TPython::Eval took 1.11219000816 s. Just using the string took just 0.444277048111 s.

Code:

import time
import progressbar
import ROOT
from myROOTtypes.chain import chain as mychain  # local wrapper class; function should be apparent
import uts  # local module with file lists

ch = mychain('...', lfiles=uts.data_selected[:10])
nEntries = ch.GetEntries()
freq = int(nEntries / 100 + 1)
can1 = ROOT.TCanvas()
df1 = ROOT.ROOT.RDataFrame(ch.chain)
h1 = df1.Histo1D('...')
fbar = progressbar.ProgressBar(maxval=nEntries, widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])
ROOT.gInterpreter.ProcessLine(
    'std::mutex bar_mutex{file_num};'
    'int fbarprogress{file_num} = 0;'
    'ROOT::RDF::RResultPtr&lt;TH1D&gt; * cpph{file_num} =  (ROOT::RDF::RResultPtr&lt;TH1D&gt; * )TPython::Eval("h1");'
    'cpph{file_num}-&gt;OnPartialResultSlot(int(TPython::Eval("freq")), [&amp;bar_mutex{file_num}, &amp;fbarprogress{file_num}](unsigned int, TH1D &amp;){{'
    'std::lock_guard&lt;std::mutex&gt; l(bar_mutex{file_num});'
    'fbarprogress{file_num} += int(TPython::Eval("freq"));'
    'char * fbarupdate_expression = Form("fbar.update(%d)", fbarprogress{file_num});'
    'TPython::Eval(fbarupdate_expression);'
    '}});'
    ''.format(file_num=1) # ensure unique definitions in the case of multiple files
)
print 'starting at', time.asctime(time.localtime(time.time()))
fbar.start()
starttime = time.time()
h1.Draw()
endtime = time.time()
fbar.finish()
print 'finished at', time.asctime(time.localtime(time.time()))
print 'duration:', endtime - starttime, 's'

ch = mychain('...', lfiles=uts.data_selected[:10])
nEntries = ch.GetEntries()
freq = int(nEntries / 100 + 1)
can2 = ROOT.TCanvas()
df2 = ROOT.ROOT.RDataFrame(ch.chain)
h2 = df2.Histo1D('...')
fbar = progressbar.ProgressBar(maxval=nEntries, widgets=[progressbar.Bar('=', '[', ']'), ' ', progressbar.Percentage()])
ROOT.gInterpreter.ProcessLine(
    'std::mutex bar_mutex{file_num};'
    'int fbarprogress{file_num} = 0;'
    'ROOT::RDF::RResultPtr<TH1D> * cpph{file_num} = (ROOT::RDF::RResultPtr<TH1D> * )TPython::Eval("h2");'
    'cpph{file_num}->OnPartialResultSlot({freq}, [&bar_mutex{file_num}, &fbarprogress{file_num}](unsigned int, TH1D &){{'
    'std::lock_guard<std::mutex> l(bar_mutex{file_num});'
    'fbarprogress{file_num} += {freq};'
    'char * fbarupdate_expression = Form("fbar.update(%d)", fbarprogress{file_num});'
    'TPython::Eval(fbarupdate_expression);'
    '}});'
    ''.format(file_num=2, freq=freq)  # ensure unique definitions in the case of multiple files
)

print 'starting at', time.asctime(time.localtime(time.time()))
fbar.start()
starttime = time.time()
h2.Draw()
endtime = time.time()
fbar.finish()
print 'finished at', time.asctime(time.localtime(time.time()))
print 'duration:', endtime - starttime, 's'

system · June 18, 2019, 4:46pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.