Hi
I want to implement proof in python through pyroot. Now I have problem with importing TPySelector:
If you have any idea about this, please let me know.
Thanks
_ROOT Version: 6.22
Platform: Not Provided
Compiler: Not Provided
Hi
I want to implement proof in python through pyroot. Now I have problem with importing TPySelector:
If you have any idea about this, please let me know.
Thanks
_ROOT Version: 6.22
Platform: Not Provided
Compiler: Not Provided
Hello,
Instead of using TPySelector
, you can directly inherit from TSelector
in Python. For example, this should work (iirc you’ll need ROOT 6.24):
import ROOT
from ROOT import TDSet, TProof
begin = False
terminate = False
class MySelector(ROOT.TSelector):
def Begin(self, tree):
global begin
begin = True
def Terminate(self):
global terminate
terminate = True
dataset = TDSet( 'TTree', 'events' )
dataset.Add('simpleTree.root')
proof = TProof.Open('')
mysel = MySelector()
dataset.Process(mysel)
print(begin, terminate)
As a replacement for Proof, we are now working on the distributed version of RDataFrame, a high-level interface to do analysis in ROOT. If you are interested in trying it out we could help you.
Thank you very much for your advices, it works.
I have a little experience working with RDataFrame. I will be happy to know how can I use RDataFrame instead of proof. Are there any documentations In this case?
Hello,
Perhaps you can start with this documentation:
https://root.cern/doc/master/classROOT_1_1RDataFrame.html
You will notice that the interface is quite different from Proof, but this interface is what we are trying to promote now from the ROOT team for analysis. The idea is that you write your analysis as a series of transformations on a dataset and some actions to get final results (e.g. a histogram). The same analysis code can then be automatically parallelized on the cores of your machine or also on multiple nodes (this is where it replaces Proof).
Hello,
thanks a lot.
Hello etejedor,
I wrote a code based on the code you sent, but the Slave Begin
, Process
, Init
functions don’t work! Do you have any idea about this problem? Actually, only Begin and Terminate functions were called.
import ROOT
from ROOT import TDSet, TProof, TChain, TTree
begin = False
terminate = False
slavebegin = False
init = False
process = False
class MySelector(ROOT.TSelector):
def Begin(self, tree):
global begin
begin = True
self.Info('Begin', '-'*60)
def Init(self, tree):
global init
init = True
self.pt_lep_2b = ROOT.std.vector('float')()
self.pt_lep_2b = 0
self.Info('Init', '-'*30)
self.fChain.SetBranchAddress("pt_lep_2b", self.pt_lep_2b)
def SlaveBegin(self, tree):
global slavebegin
slavebegin = True
self.Info('SlaveBegin', '-'*60)
self.eventsProcessed = 0
self.h_pt = ROOT.TH1F("pt_lep_2b","pt_lep_2b",100,0,200)
self.outList=self.GetOutputList()
self.outList.Add(self.h_pt)
def Process(self, entry):
global process
process = True
print ("Process")
# self.fChain.GetEntry(entry)
self.GetTree().GetEntry(entry)
# for position in range(0, self.pt_lep_2b.size()):
for position in range(0, 2):
# self.h_pt.Fill(self.pt_lep_2b[position], 1)
self.h_pt.Fill(10, 1)
return True
def SlaveTerminate(self):
print ("Slave Terminate")
def Terminate(self):
global terminate
terminate = True
self.file = ROOT.TFile("histos.root","RECREATE");
# for var in range(0, len(self.variable)):
# self.outList.FindObject("pt_lep_2b").Write()
self.file.Close()
dataset = TDSet( 'TTree', 'analysis' )
dataset.Add('/home/sima/jupy/hep_ml/tree_data.root')
proof = TProof.Open('')
mysel = MySelector()
dataset.Process( mysel)
print(begin, terminate, slavebegin, init, process)
The output :
True True False False False
Info in <TProofLite::SetQueryRunning>: starting query: 3
Info in <TProofQueryResult::SetRunning>: nwrks: 4
Info in <TSelector::Begin>: ------------------------------------------------------------
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in <TPacketizer::TPacketizer>: Initial number of workers: 4
Validating files: OK (1 files)
[TProof::Progress] Total 129983 events |====================| 100.00 % [14442556.0 evts/s, 20.6 MB/s, time left: 0.0 s]
Query processing time: 0.0 s
Lite-0: all output objects have been merged
Thanks
Sima
Hello,
I am not sure why Proof is not invoking all the methods you define.
But if I do the same in C++, the behaviour is the same:
class MySelector : public TSelector {
public:
void Begin(TTree *tree) { cout << "BEGIN" << endl; }
void Init(TTree *tree) { cout << "END" << endl; }
void SlaveBegin(TTree *tree) { cout << "SLAVEBEGIN" << endl; }
void Process(long entry) { cout << "PROCESS" << entry << endl; }
void SlaveTerminate() { cout << "SLAVETERM" << endl; }
void Terminate() { cout << "TERMINATE" << endl; }
};
void tselforum () {
TDSet dataset("TTree", "events");
dataset.Add("/tmp/simpleTree.root");
auto proof = TProof::Open("");
MySelector mysel;
dataset.Process(&mysel);
}
Only Begin
and Terminate
are called. So this means it is not a Python problem.
Perhaps @ganis knows what’s missing?
I reiterate my offering to help you convert this to RDataFrame if that’s an option for you.
Cheers,
Enric
Hello etejedor,
Because the code I want to execute with proof is not the analysis code, but solves some equation s for tree events and also calls several external libraries, so I thought it could not be done with RDataFrame.
about c++ code and proof:
already I wrote my Analysis code based on the samples in c++ and ran with proof. I have never seen command “cout” or “printf” work in SlaveBegin
, Process
and SlaveTerminate
functions, but as your code shows above works for Begin
and Terminate
functions. I don’t know why. But I always define objects like TH1F(histogram) in SlaveBegin
and add to OutputList in c++ codes, if “SlaveTerminate” function works, the TH1F objects must be seen in OutputList in Terminate, or we can write it to a root file. I don’t know what is missed in python one.
Thanks for your great helps
Sima
For the output of “cout
” and “printf
”, try to see into:
find ${HOME}/.proof -name "*.log" -type f -print
Hello,
As for the example we are discussing here, the behaviour from Python and C++ is the same (i.e. only Begin
and Terminate
are called). I can’t find any trace of evidence that says otherwise in $HOME/.proof either, neither for C++ nor for Python. We’ll need @ganis to shed some light on this.
Hi,
This sounds weird. The printouts of SalveBegin, Process, SlaveTerminate should be in the workers logs, as those methods are executed in there. Can you locate the workers logs under $HOME/.proof ?
This said, with respect to
RDataFrame
is much more flexible than PROOF
so I would be surprised if you cannot solve your equation in RDF
. If you (@Sima_Bashiri ) could post an example of what you ant to do, I am sure that Enric or the other ROOT developers can help you in the translation.
G Ganis
Hi,
I ran again the C++ version of the program (see above), which spawns 8 workers (I guess as many as logical cores in my laptop). This is the output of one of the workers (worker-0.6.log):
11:25:14 9023 Wrk-0.6 | Info in <TProofServLite::Setup>: fWorkDir: /home/etejedor/.proof
11:25:14 9023 Wrk-0.6 | Info in <TProofServLite::SetupCommon>: 0 global package directories registered
11:25:15 9023 Wrk-0.6 | Info in <TProofServLite::HandleProcess>: selector obj for 'TSelector' found
11:25:15 9023 Wrk-0.6 | Info in <TProofServLite::HandleProcess>: calling fPlayer->Process() with selector object: TSelector
11:25:15 9023 Wrk-0.6 | Info in <TProofPlayerSlave::AssertSelector>: Processing via TSelector object
11:25:15 9023 Wrk-0.6 | Info in <TEventIter::TEventIter>: fPackets list 'ProcessedPackets_0.6' created
11:25:15 9023 Wrk-0.6 | Info in <TProofPlayerSlave::Process>: save partial results? 0 per-packet? 0
11:25:15 9023 Wrk-0.6 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 369648 virtual 200144 resident event 0
11:25:15 9023 Wrk-0.6 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 369648 virtual 200144 resident event 0
11:25:15 9023 Wrk-0.6 | Error in <TProofServLite::HandleSocketInput>: retrieving message from input socket
11:25:15 9023 Wrk-0.6 | Info in <TProofServLite::Terminate>: starting session termination operations ...
11:25:15 9023 Wrk-0.6 | Info in <TProofServLite::Terminate>: data directory '/home/etejedor/.proof/data/0.6/0.6-laptop-x1-1639131914-9023' has been removed
Terminate: termination operations ended: quitting!
There’s an error there but I don’t see the message I print, though.
Can it be that the “MySelector
” class is not propagated to slaves?
Your slave says “selector object: TSelector
” and I would expect “selector object: MySelector
” (or something similar).
Hello,
I put the intended code here and also the code is simplified so that it could be converted to RDF faster. The “nuSolutions.py” file is imported in the main code. The code is written in Python 2.
import nuSolutions as nu
import ROOT as r
import numpy as np
import math
f = r.TFile.Open("file.root")
tree = r.gDirectory.Get('')
t = f.Get("tree")
entries = t.GetEntriesFast()
for jentry in xrange(50):
t.GetEntry(jentry)
bList = []
lList = []
metxList = []
metyList = []
pt_b = t.pt_b_ee
eta_b = t.eta_b_ee
phi_b = t.phi_b_ee
e_b = t.e_b_ee
pt_lep = t.pt_lep_ee
eta_lep = t.eta_lep_ee
phi_lep = t.phi_lep_ee
e_lep = t.e_lep_ee
ch_lep = t.ch_lep_ee
pt_met = t.pt_met_ee
phi_met = t.phi_met_ee
for pt,eta,phi,e,charge in map(None, pt_lep, eta_lep, phi_lep, e_lep, ch_lep):
l = r.TLorentzVector()
l.SetPtEtaPhiE(pt,eta,phi,e)
ll= l,charge
lList.append(ll)
for pt,eta,phi,e in map(None, pt_b, eta_b, phi_b, e_b):
b = r.TLorentzVector()
b.SetPtEtaPhiE(pt,eta,phi,e)
bList.append(b)
for pt, phi in map(None, pt_met, phi_met):
mett = r.TLorentzVector()
mett.SetPtEtaPhiE(pt,0,phi,0)
px = mett.Px()
py = mett.Py()
metxList.append(px)
metyList.append(py)
if(len(bList) >= 2):
metx = metxList[0]
mety = metyList[0]
try:
solver = nu.doubleNeutrinoSolutions((bList[0],bList[1]),(lList[0][0],lList[1][0]),(metx,mety),(nu.mW)**2,(nu.mT)**2)
Neu = solver.nunu_s
print (Neu)
except :
continue
nuSolutions.py (8.4 KB)
file.root (1.4 MB)
By looking at your code, one fundamental change is the removal of the event loop: you will no longer need to loop over the tree entries and get the branches from the iterated tree. Instead, the event loop will run in RDataFrame code, in C++, while your code only applies per-event operations to the tree dataset.
You can, for example, create new branches for lList
, bList
, metxList
and metyList
with Define
. The type of those branches can be a vector (RVec
). You can use later bList
in a Filter
to decide whether you do the doubleNeutrinoSolutions
computation for a particular event depending on the length of bList
.
Perhaps the trickiest part here is invoking the code that you have in nuSolutions.py
, which is in Python. RDataFrame allows you to compile with Numba a given Python function (e.g. your doubleNeutrinoSolutions
), see example:
https://root.cern.ch/doc/master/pyroot004__NumbaDeclare_8py.html
with some limitations. You would need to try whether that works fine, otherwise the function would need to be implemented in C++.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.