Proof in python

Sima_Bashiri · December 2, 2021, 7:49am

Hi
I want to implement proof in python through pyroot. Now I have problem with importing TPySelector:

If you have any idea about this, please let me know.
Thanks

_ROOT Version: 6.22
Platform: Not Provided
Compiler: Not Provided

etejedor · December 2, 2021, 2:06pm

Hello,

Instead of using TPySelector, you can directly inherit from TSelector in Python. For example, this should work (iirc you’ll need ROOT 6.24):

import ROOT
from ROOT import TDSet, TProof

begin = False
terminate = False

class MySelector(ROOT.TSelector):
    def Begin(self, tree):
        global begin
        begin = True

    def Terminate(self):
        global terminate
        terminate = True

dataset = TDSet( 'TTree', 'events' )
dataset.Add('simpleTree.root')

proof = TProof.Open('')

mysel = MySelector()

dataset.Process(mysel)

print(begin, terminate)

As a replacement for Proof, we are now working on the distributed version of RDataFrame, a high-level interface to do analysis in ROOT. If you are interested in trying it out we could help you.

Sima_Bashiri · December 2, 2021, 7:36pm

Thank you very much for your advices, it works.
I have a little experience working with RDataFrame. I will be happy to know how can I use RDataFrame instead of proof. Are there any documentations In this case?

etejedor · December 3, 2021, 9:05am

Hello,

Perhaps you can start with this documentation:

https://root.cern/doc/master/classROOT_1_1RDataFrame.html

You will notice that the interface is quite different from Proof, but this interface is what we are trying to promote now from the ROOT team for analysis. The idea is that you write your analysis as a series of transformations on a dataset and some actions to get final results (e.g. a histogram). The same analysis code can then be automatically parallelized on the cores of your machine or also on multiple nodes (this is where it replaces Proof).

Sima_Bashiri · December 3, 2021, 3:16pm

Hello,
thanks a lot.

Sima_Bashiri · December 4, 2021, 1:22pm

Hello etejedor,
I wrote a code based on the code you sent, but the Slave Begin, Process, Init functions don’t work! Do you have any idea about this problem? Actually, only Begin and Terminate functions were called.

import ROOT
from ROOT import TDSet, TProof, TChain, TTree

begin = False
terminate = False
slavebegin = False
init = False
process = False
class MySelector(ROOT.TSelector):
    
    def Begin(self, tree):
        global begin
        begin = True
        self.Info('Begin', '-'*60)
        
    def Init(self, tree):
        global init
        init = True
        self.pt_lep_2b = ROOT.std.vector('float')()
        self.pt_lep_2b = 0
        self.Info('Init', '-'*30)
        self.fChain.SetBranchAddress("pt_lep_2b", self.pt_lep_2b)


    def SlaveBegin(self, tree): 
        global slavebegin
        slavebegin = True
        self.Info('SlaveBegin', '-'*60)
        self.eventsProcessed = 0
        self.h_pt = ROOT.TH1F("pt_lep_2b","pt_lep_2b",100,0,200)
        self.outList=self.GetOutputList()
        self.outList.Add(self.h_pt)

    def Process(self, entry):
        global process
        process = True
        print ("Process")
#         self.fChain.GetEntry(entry)
        self.GetTree().GetEntry(entry)
#         for position in range(0, self.pt_lep_2b.size()):
        for position in range(0, 2):
#             self.h_pt.Fill(self.pt_lep_2b[position], 1)
            self.h_pt.Fill(10, 1)

        return True

    def SlaveTerminate(self):
        print ("Slave Terminate")


    def Terminate(self):
        global terminate
        terminate = True
        self.file = ROOT.TFile("histos.root","RECREATE");

#		for var in range(0, len(self.variable)):
#         self.outList.FindObject("pt_lep_2b").Write()
        self.file.Close()

dataset = TDSet( 'TTree', 'analysis' )
dataset.Add('/home/sima/jupy/hep_ml/tree_data.root')

proof = TProof.Open('')
mysel = MySelector()

dataset.Process( mysel)

print(begin, terminate, slavebegin, init, process)

The output :

True True False False False
 
Info in <TProofLite::SetQueryRunning>: starting query: 3
Info in <TProofQueryResult::SetRunning>: nwrks: 4
Info in <TSelector::Begin>: ------------------------------------------------------------
Looking up for exact location of files: OK (1 files)                 
Looking up for exact location of files: OK (1 files)                 
Info in <TPacketizer::TPacketizer>: Initial number of workers: 4
Validating files: OK (1 files)                 
[TProof::Progress] Total 129983 events	|====================| 100.00 % [14442556.0 evts/s, 20.6 MB/s, time left: 0.0 s]
 Query processing time: 0.0 s
Lite-0: all output objects have been merged

Thanks
Sima

etejedor · December 6, 2021, 12:46pm

Hello,

I am not sure why Proof is not invoking all the methods you define.

But if I do the same in C++, the behaviour is the same:

class MySelector : public TSelector {
        public:

    void Begin(TTree *tree) { cout << "BEGIN" << endl; }

    void Init(TTree *tree) { cout << "END" << endl; }

    void SlaveBegin(TTree *tree) { cout << "SLAVEBEGIN" << endl; }

    void Process(long entry) { cout << "PROCESS" << entry << endl; }

    void SlaveTerminate() { cout << "SLAVETERM" << endl; }

    void Terminate() { cout << "TERMINATE" << endl; }
};


void tselforum () {
    TDSet dataset("TTree", "events");
    dataset.Add("/tmp/simpleTree.root");

    auto proof = TProof::Open("");
    MySelector mysel;

    dataset.Process(&mysel);
}

Only Begin and Terminate are called. So this means it is not a Python problem.

Perhaps @ganis knows what’s missing?

I reiterate my offering to help you convert this to RDataFrame if that’s an option for you.

Cheers,
Enric

Sima_Bashiri · December 7, 2021, 2:42pm

Hello etejedor,

Because the code I want to execute with proof is not the analysis code, but solves some equation s for tree events and also calls several external libraries, so I thought it could not be done with RDataFrame.
about c++ code and proof:
already I wrote my Analysis code based on the samples in c++ and ran with proof. I have never seen command “cout” or “printf” work in SlaveBegin , Process and SlaveTerminate functions, but as your code shows above works for Begin and Terminate functions. I don’t know why. But I always define objects like TH1F(histogram) in SlaveBegin and add to OutputList in c++ codes, if “SlaveTerminate” function works, the TH1F objects must be seen in OutputList in Terminate, or we can write it to a root file. I don’t know what is missed in python one.

Thanks for your great helps
Sima

Wile_E_Coyote · December 7, 2021, 4:21pm

For the output of “cout” and “printf”, try to see into:
find ${HOME}/.proof -name "*.log" -type f -print

etejedor · December 7, 2021, 4:46pm

Hello,

As for the example we are discussing here, the behaviour from Python and C++ is the same (i.e. only Begin and Terminate are called). I can’t find any trace of evidence that says otherwise in $HOME/.proof either, neither for C++ nor for Python. We’ll need @ganis to shed some light on this.

ganis · December 10, 2021, 10:09am

Hi,

This sounds weird. The printouts of SalveBegin, Process, SlaveTerminate should be in the workers logs, as those methods are executed in there. Can you locate the workers logs under $HOME/.proof ?

This said, with respect to

RDataFrame is much more flexible than PROOF so I would be surprised if you cannot solve your equation in RDF. If you (@Sima_Bashiri ) could post an example of what you ant to do, I am sure that Enric or the other ROOT developers can help you in the translation.

G Ganis

etejedor · December 10, 2021, 10:30am

Hi,

I ran again the C++ version of the program (see above), which spawns 8 workers (I guess as many as logical cores in my laptop). This is the output of one of the workers (worker-0.6.log):

11:25:14  9023 Wrk-0.6 | Info in <TProofServLite::Setup>: fWorkDir: /home/etejedor/.proof
11:25:14  9023 Wrk-0.6 | Info in <TProofServLite::SetupCommon>:  0 global package directories registered
11:25:15  9023 Wrk-0.6 | Info in <TProofServLite::HandleProcess>: selector obj for 'TSelector' found
11:25:15  9023 Wrk-0.6 | Info in <TProofServLite::HandleProcess>: calling fPlayer->Process() with selector object: TSelector
11:25:15  9023 Wrk-0.6 | Info in <TProofPlayerSlave::AssertSelector>: Processing via TSelector object
11:25:15  9023 Wrk-0.6 | Info in <TEventIter::TEventIter>: fPackets list 'ProcessedPackets_0.6' created
11:25:15  9023 Wrk-0.6 | Info in <TProofPlayerSlave::Process>: save partial results? 0  per-packet? 0
11:25:15  9023 Wrk-0.6 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 369648 virtual 200144 resident event 0
11:25:15  9023 Wrk-0.6 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 369648 virtual 200144 resident event 0
11:25:15  9023 Wrk-0.6 | Error in <TProofServLite::HandleSocketInput>: retrieving message from input socket
11:25:15  9023 Wrk-0.6 | Info in <TProofServLite::Terminate>: starting session termination operations ...
11:25:15  9023 Wrk-0.6 | Info in <TProofServLite::Terminate>: data directory '/home/etejedor/.proof/data/0.6/0.6-laptop-x1-1639131914-9023' has been removed
Terminate: termination operations ended: quitting!

There’s an error there but I don’t see the message I print, though.

Wile_E_Coyote · December 10, 2021, 11:22am

Can it be that the “MySelector” class is not propagated to slaves?

Your slave says “selector object: TSelector” and I would expect “selector object: MySelector” (or something similar).

Sima_Bashiri · December 10, 2021, 10:49pm

Hello,
I put the intended code here and also the code is simplified so that it could be converted to RDF faster. The “nuSolutions.py” file is imported in the main code. The code is written in Python 2.

import nuSolutions as nu
import ROOT as r
import numpy as np
import math
f = r.TFile.Open("file.root")
tree = r.gDirectory.Get('')
t = f.Get("tree")
entries = t.GetEntriesFast()
for jentry in xrange(50):
    t.GetEntry(jentry)
    bList = []
    lList = []
    metxList = []
    metyList = []
    pt_b = t.pt_b_ee
    eta_b = t.eta_b_ee
    phi_b = t.phi_b_ee
    e_b = t.e_b_ee
    pt_lep = t.pt_lep_ee
    eta_lep = t.eta_lep_ee
    phi_lep = t.phi_lep_ee
    e_lep = t.e_lep_ee
    ch_lep = t.ch_lep_ee
    pt_met = t.pt_met_ee
    phi_met = t.phi_met_ee

    for pt,eta,phi,e,charge in map(None, pt_lep, eta_lep, phi_lep, e_lep, ch_lep):
        l = r.TLorentzVector()
        l.SetPtEtaPhiE(pt,eta,phi,e)
        ll= l,charge
        lList.append(ll)
    for pt,eta,phi,e in map(None, pt_b, eta_b, phi_b, e_b):
        b = r.TLorentzVector()
        b.SetPtEtaPhiE(pt,eta,phi,e)
        bList.append(b)
    for pt, phi in map(None, pt_met, phi_met):
        mett = r.TLorentzVector()
        mett.SetPtEtaPhiE(pt,0,phi,0)
        px = mett.Px()
        py = mett.Py()
        metxList.append(px)
        metyList.append(py)
    if(len(bList) >= 2):
        metx = metxList[0]
        mety = metyList[0]
        try:
            solver = nu.doubleNeutrinoSolutions((bList[0],bList[1]),(lList[0][0],lList[1][0]),(metx,mety),(nu.mW)**2,(nu.mT)**2)
            Neu = solver.nunu_s
            print (Neu)
        except :
            continue

nuSolutions.py (8.4 KB)
file.root (1.4 MB)

etejedor · December 13, 2021, 4:07pm

By looking at your code, one fundamental change is the removal of the event loop: you will no longer need to loop over the tree entries and get the branches from the iterated tree. Instead, the event loop will run in RDataFrame code, in C++, while your code only applies per-event operations to the tree dataset.

You can, for example, create new branches for lList, bList, metxList and metyList with Define. The type of those branches can be a vector (RVec). You can use later bList in a Filter to decide whether you do the doubleNeutrinoSolutions computation for a particular event depending on the length of bList.

Perhaps the trickiest part here is invoking the code that you have in nuSolutions.py, which is in Python. RDataFrame allows you to compile with Numba a given Python function (e.g. your doubleNeutrinoSolutions), see example:

https://root.cern.ch/doc/master/pyroot004__NumbaDeclare_8py.html

with some limitations. You would need to try whether that works fine, otherwise the function would need to be implemented in C++.

system · December 27, 2021, 4:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.