PROOF-Lite not working

Dear experts,

I’m trying to use PROOF-Lite with TSelector, but it doesn’t work.
I have the following class derived from TSelector:

class Analysis : public TSelector
{
   public:
      Analysis(/*argument */);
      ~Analysis();

      /// === TSelector methods
      virtual Int_t   Version() const {return 1;}
      virtual void    Begin(TTree *);
      virtual void    SlaveBegin(TTree *tree);
      virtual void    Init(TTree *tree);
      virtual Bool_t  Notify();
      virtual Bool_t  Process(Long64_t entry);
      virtual void    SetOption(const char *option) { fOption = option; }
      virtual void    SetObject(TObject *obj) { fObject = obj; }
      virtual void    SetInputList(TList *input) {fInput = input;}
      virtual TList  *GetOutputList() const { return fOutput; }
      virtual void    SlaveTerminate();
      virtual void    Terminate();

      TChain *GetChain() { return chain; }

   private:
      TChain *chain; // the loaded chain depends on the arguments passed to the constructor

      TH1D *hist;
};

void Analysis::SlaveBegin(TTree *tree)
{
   hist = new TH1D("hist", "hist", 100, 0, 1);
   fOutput->Add(hist);
}

void Analysis::Init(TTree *tree)
{
   chain = dynamic_cast<TChain *>(tree);
}

Bool_t Analysis::Process(Long64_t entry)
{
   chain->GetTree()->GetEntry(entry);
   /* other code: cuts, etc */
   hist->Fill(chain->GetLeaf("var")->GetValue());
   return true;
}

void MonthlyFluxClass::Terminate()
{
   hist = dynamic_cast<TH1D *>(fOutput->FindObject("hist"));

   TFile output("test.root", "RECREATE");
   hist->Write();
   output.Close();
}

If I process the chain without PROOF, everything works (i.e. hist is filled and saved in test.root)

Analysis *analysis = new Analysis(/*arguments*/);
TVirtualTreePlayer *tp = analysis->GetChain()->GetPlayer();
tp->Process((TSelector *)analysis, "", analysis->GetChain()->GetEntries(), 0);

If I use PROOF-Lite, the chain is processed in 1 second (which is impossible: it takes ~20 min with one core, so with 6 cores it should take at least ~3 min) and hist is always NULL in Terminate

TProofLite *proof = new TProofLite("workers=6");
Analysis *analysis = new Analysis(/*arguments*/);
proof->Process((TSelector *)analysis, analysis->GetChain()->GetEntries());

The PROOF query progress dialog says that it processed the correct number of events in the chain and also the logs show that the events are processed (there are lot of lines like “13:48:42 12334 Wrk-0.1 | SvcMsg in TProofPlayerSlave::CheckMemUsage: Memory 104024 virtual 26496 resident event 3975382”).
It looks that for some reason if I use proof, Process exit immediately and hist is not added to the TSelectorList, but I cannot debug it because of this problem: [Logging output of PROOF-Lite

Am I using PROOF in the wrong way?

Thanks.

Ok, I managed to enable the debug in PROOF

gProofDebugMask = TProofDebug::kAll;
gProofDebugLevel = 5;

and I changed the way of calling Process

TFileCollection *fc = new TFileCollection("Selected", "");
TObjArray *files = chain->GetListOfFiles();
for (UShort_t ifile = 0; ifile < files->GetEntries(); ++ifile)
{
   fc->Add(files->At(ifile)->GetTitle());
}
proof->Process(fc, (TSelector *)mc_data);

before the chain was not added to PROOF, so of course it was running very fast, it was not looping on the chain!
Now the files are validated (Info in TPacketizer::ValidateFiles: sent to worker-0.0 etc…) and the entries are correctly calculated.
I also had to move the creation of the histogram from SlaveBegin to Begin, because SlaveBegin is never called.
Now the histogram is added to the TSelectorList and in Terminate is successfully retrieved, but it has always zero entries.
Looking at the debug, Notify and Process are never called.
In the output logs I have a lot of lines like “Info in TEventIterTree::GetTrees: the tree cache is in learning phase”.

Any ideas what it’s going wrong?

Hi,

Four things:

  1. You must have a TChain defined some where and you should access it outside the selector; why do you think you need to access via there?

  2. Use TProof::Open to get PROOF; this allows proper handling of objects

  3. PROOF is not a TChain nor a TTree . Process needs to be told which dataset to process. If you start from a TChain, you need to call TChain::SetProof and then TChain::Process (see below)

  4. The PROOF workers are separate processes. The process-by-selector functionality, where you pass the selector object to Process, requires the workers know about the selector class, to be able to stream in correctly when Process is issued.

Could you then try the following:

TChain *chain = ...    // This is your chain
TProof *proof = TProof::Open("workers=6");

proof->Load("Analysis.C+"); // Or whatever the implementation file of Analysis is called; full path required; loads also locally

Analysis *analysis = new Analysis(/*arguments*/);

chain->SetProof();
chain->Process((TSelector *)analysis);  // You need to set the entries only if a subset is needed ...

Note that in the implementation you posted the Terminate method is in the wrong scope; but this is probably just a posting mistake.

Unrelated to PROOF:
You are not supposed to use directly TVirtualTreePlayer when running locally. Why do you need to do that?

G Ganis

More output from the debug, maybe it could help.
The files are validated

Validating files: OK (6019 files)                 
Info in <TPacketizer::ValidateFiles>:  70154202 events validated

then there is another loop on the files, the messages are like:

Info in <TPacketizer::TPacketizer>: processing range: first 0, num -1
Info in <TPacketizer::TPacketizer>:  --> 'file:///media/data/AMS/data/MonthlyProtonFluxMC/pr_0.5_10/MC_pr_0.5_10_div_0.root'
Info in <TPacketizer::TPacketizer>:  --> first 0, num 13866 (cur 0)
Info in <TPacketizer::TPacketizer>:  --> adjust start 0 and end -1
Info in <TPacketizer::TPacketizer>:  --> next cur 13866
TDSetElement file="file:///media/data/AMS/data/MonthlyProtonFluxMC/pr_0.5_10/MC_pr_0.5_10_div_0.root" dir="" obj="Selected" first=0 num=13866 msd=""

After this:

Info in <TPacketizer::TPacketizer>: processing 70154202 entries in 6019 files on 1 hosts
Info in <TPacketizer::TPacketizer>: Base Packetsize = 584618
Info in <TPacketizer::TPacketizer>: Return
Info in <TProofPlayerLite::SetupFeedback>: "FeedbackList" found: 2 objects
Info in <TProofPlayerLite::Process>: Calling Broadcast
Info in <TProofPlayerLite::Process>: Synchronous processing: calling Collect
Info in <TProofLite::Collect>: >>>>>> Entering collect responses #9471
Info in <TProofLite::Collect>: #9471: active: 6
Info in <TProofLite::Collect>: Will invoke Select() #9471
Info in <TPacketizer::HandleTimer>: fProgress: 0x3c09cd0, isDone: 0
Info in <TProofPlayerLite::Progress>: 70154202 0 0 25.072001 0.000000 -1.000000 -1.000000
Info in <TProofLite::Progress>: 70154202 0 0 25.072001 0.000000 -1.000000 -1.000000
Info in <TProofLite::HandleInputMessage>: got type 1035 from '0.2'
Info in <TProofLite::HandleInputMessage>: kPROOF_STARTPROCESS: enter
Info in <TProofLite::Collect>: Will invoke Select() #9471
Info in <TProofLite::HandleInputMessage>: got type 1011 from '0.2'
Info in <TProofLite::HandleInputMessage>: 0.2: kPROOF_GETPACKET
Info in <TPacketizer::SetInitTime>: fInitTime set to 25.098000 s
Info in <TPacketizer::GetNextPacket>: worker-0.2 (r2d2)

and then there is again another loop on the files, with messages like:

TPacketizer::NextUnAllocNode()
Collection name='TList', class='TList', size=1
 OBJ: TObject	r2d2	MySlaveCount 0	SlaveCount 0
Info in <TPacketizer::GetNextPacket>: 0.2: file:///media/data/AMS/data/MonthlyProtonFluxMC/pr_0.5_10/MC_pr_0.5_10_div_0.root 0 13866
Info in <TProofPlayerLite::GetNextPacket>: 0.2 (r2d2): 'file:///media/data/AMS/data/MonthlyProtonFluxMC/pr_0.5_10/MC_pr_0.5_10_div_0.root' '' 'Selected' 0 13866
Info in <TProofLite::Collect>: Will invoke Select() #9471
Info in <TProofLite::HandleInputMessage>: got type 1035 from '0.0'
Info in <TProofLite::HandleInputMessage>: kPROOF_STARTPROCESS: enter
Info in <TProofLite::Collect>: Will invoke Select() #9471
Info in <TProofLite::HandleInputMessage>: got type 1011 from '0.0'
Info in <TProofLite::HandleInputMessage>: 0.0: kPROOF_GETPACKET
Info in <TPacketizer::GetNextPacket>: worker-0.0 (r2d2)

After this, all the workers are done and start merging the ouput list.
If it is needed, I can attach the full debug output.

Hi Ganis,

thanks for the suggestions.

I am compiling my class with make and then I load the shared library in ROOT from the rootlogon.C, so I need to pass to the TChain::Process the TSelector instance and looking in the documentation it seems that the only way is to use TVirtualTreePlayer; anyway, the chain is correctly processed in this way.

For the rest, I’ll try do as you suggested.

Hi,

There one initial loop over the files for validation which builds up the internal structures about the files to be processed, and then there is the packetizing machinery, which is just distributing work to the workers based on internal structures. Your verbose output just shows that .

The validation step can be skipped or optimized using the concept of data set (root.cern.ch/drupal/content/working-data-sets) .

For the last post,

TChain::Process(TSelector *, …) exist and does what you need.
The TTreePlayer family is internal and will be likely removed from the public interfaces in future versions.

G Ganis

Ah, thank you, I missed it!

  1. I tried to use proof->Load(), but as I expected it didn’t manage to compile the TSelector class, because it uses other libraries already compiled.
    So I tried to load directly the libraries with
proof->Load("path/lib1.so");
proof->Load("otherpath/lib2.so"); // linked with make against lib1
proof->Load("libanalysis.so"); // TSelector class, linked with make against lib1 and lib2

and I had to run root as

LD_LIBRARY_PATH="path:otherpath:$(pwd):$LD_LIBRARY_PATH" root -l

This way, though, PROOF tries to reload all the libraries already loaded when root start (???)
Then, after

chain->SetProof();

I get these errors

Warning in <TClass::TClass>: no dictionary for class TProofChain is available
Error in <TPluginHandler::SetupCallEnv>: method TProofChain not found in class TProofChain
Warning in <TClass::TClass>: no dictionary for class TChain is available
Error in <TChain::SetProof>: creation of TProofChain failed

I get other “no dictionary for class” errors (TSelector, TTreePlayer, etc) after

chain->Process(analysis);
  1. I tried to use the datasets
Analysis *analysis = new Analysis(/*arguments*/);

TChain *chain = analysis->GetChain();
TFileCollection *fc = new TFileCollection();
TObjArray *files = chain->GetListOfFiles();
for (UShort_t ifile = 0; ifile < files->GetEntries(); ++ifile)
{
   fc->Add(files->At(ifile)->GetTitle());
}
fc->Update();

TProof::Open("workers=2");
gProof->ShowDataSets();

gProofDebugMask  = TProofDebug::kAll;
gProofDebugLevel = 1000;

gProof->RegisterDataSet("mcrange0", fc, "VO");
gProof->ShowDataSets();

gProof->Process("mcrange0", (TSelector *)analysis);

I added Info calls in Begin, SlaveBegin, SlaveTerminate, Notify, Init, Process and Terminate, but the only output I get is from Begin and Terminate.
Attached you can see the output from the debug, master and worker logs.
worker.0.1.txt (2.8 KB)
worker.0.0.txt (2.95 KB)
master.txt (4.48 KB)
proof-debug.txt (96.8 KB)

Hi,
The workers of PROOF-Lite are inheriting the PATH and LD_LIBRARY_PATH of the ROOT session.
So, what you should do is

$ source /path/to/root/bin/thisroot.sh

which defines ROOTSYS and adds the related relevant paths to LD_LIBRARY_PATH and PATH.
And then

$ export LD_LIBRARY_PATH=path:otherpath:$LD_LIBRARY_PATH

After this, if this works

root[] .L libanalysis.so

then this should also work:

root[] proof->Load("libanalysis.so")

otherwise there is something fishy.

TProof::Load by default loads things also locally. If this create issues with reloading or unloading, you should either skip loading locally before PROOF or call TProof::Load with the second argument kTRUE:

root[] proof->Load("libanalysis.so", kTRUE)

[quote=“ccorti”]Warning in TClass::TClass: no dictionary for class TProofChain is available
Error in TPluginHandler::SetupCallEnv: method TProofChain not found in class TProofChain
Warning in TClass::TClass: no dictionary for class TChain is available
Error in TChain::SetProof: creation of TProofChain failed[/quote]
I never seen this before. I would say that these indicate that you have an environment setting issue or that your ROOT installation is not complete for some reason …

The dataset registration seems to work.
Could you please just rerun the following:

TProof::Open("workers=2");
gProof->ShowDataSets();
Analysis *analysis = new Analysis(/*arguments*/);
gProof->Process("mcrange0", (TSelector *)analysis);

and post the same master and worker logs?
(no additional verbosity, for the moment).

G Ganis

Master:

14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 265792 virtual 63428 resident after merging object MissingFiles
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63768 resident after merging object PROOF_Status
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63864 resident after merging object PROOF_TOutputListSelectorDataMap_object
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63884 resident after merging object PROOF_SelectorStatus
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63888 resident after merging object PROOF_Status
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63904 resident after merging object PROOF_TOutputListSelectorDataMap_object
14:19:53 14719 proof | SvcMsg in <TProofPlayerLite::NotifyMemory>: Memory 266236 virtual 63904 resident after merging object PROOF_SelectorStatus

Workers:


// --------- Start of element log -----------------

// Ordinal: 0.0 (role: worker)

// Path: /home/genesys87/.proof/Dropbox-phd-uhm-lxplus-code-MonthlyProtonFlux-unfolding/session-r2d2-1427984393-14719/worker-0.0-r2d2-1427984393-14767.log 
// # of retrieved lines: 14 


// ------------------------------------------------

14:19:53 14767 Wrk-0.0 | Info in <TProofServLite::Setup>: fWorkDir: /home/genesys87/.proof
Note: File "iostream" already loaded
14:19:53 14767 Wrk-0.0 | Info in <TProofServLite::HandleProcess>: selector obj for 'TSelector' found
14:19:53 14767 Wrk-0.0 | Info in <TProofServLite::HandleProcess>: calling fPlayer->Process() with selector object: TSelector
14:19:53 14767 Wrk-0.0 | Info in <TProofPlayerSlave::AssertSelector>: Processing via TSelector object
14:19:53 14767 Wrk-0.0 | Info in <TEventIter::TEventIter>: fPackets list 'ProcessedPackets_0.0' created
14:19:53 14767 Wrk-0.0 | Info in <TProofPlayerSlave::Process>: save partial results? 0  per-packet? 0
14:19:53 14767 Wrk-0.0 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 116240 virtual 31324 resident event 0
14:19:53 14767 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14767 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14767 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14767 Wrk-0.0 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14767 Wrk-0.0 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 148132 virtual 33664 resident event 53360

// --------- End of element log -------------------


Retrieving logs: 1 ok, 0 not ok (100% processed)


// --------- Start of element log -----------------

// Ordinal: 0.1 (role: worker)

// Path: /home/genesys87/.proof/Dropbox-phd-uhm-lxplus-code-MonthlyProtonFlux-unfolding/session-r2d2-1427984393-14719/worker-0.1-r2d2-1427984393-14769.log 
// # of retrieved lines: 13 


// ------------------------------------------------

14:19:53 14769 Wrk-0.1 | Info in <TProofServLite::Setup>: fWorkDir: /home/genesys87/.proof
Note: File "iostream" already loaded
14:19:53 14769 Wrk-0.1 | Info in <TProofServLite::HandleProcess>: selector obj for 'TSelector' found
14:19:53 14769 Wrk-0.1 | Info in <TProofServLite::HandleProcess>: calling fPlayer->Process() with selector object: TSelector
14:19:53 14769 Wrk-0.1 | Info in <TProofPlayerSlave::AssertSelector>: Processing via TSelector object
14:19:53 14769 Wrk-0.1 | Info in <TEventIter::TEventIter>: fPackets list 'ProcessedPackets_0.1' created
14:19:53 14769 Wrk-0.1 | Info in <TProofPlayerSlave::Process>: save partial results? 0  per-packet? 0
14:19:53 14769 Wrk-0.1 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 116236 virtual 31312 resident event 0
14:19:53 14769 Wrk-0.1 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14769 Wrk-0.1 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14769 Wrk-0.1 | Info in <TEventIterTree::GetTrees>: the tree cache is in learning phase
14:19:53 14769 Wrk-0.1 | SvcMsg in <TProofPlayerSlave::CheckMemUsage>: Memory 148132 virtual 33656 resident event 30304

// --------- End of element log -------------------

I’ll try to load the libraries with the changes you suggested.
Thanks.

Adding kTRUE a proof->Load() solve the problem of reloading libraries and also the “no dictionary” errors, but still does not process the events in the chain…

TProof::Open("workers=2");
gProof->Load("path/lib1.so", true);
gProof->Load("otherpath/lib2.so", true);
gProof->Load("libanalysis.so", true);
gProof->ShowDataSets();

Analysis *analysis = new Analysis(/*arguments*/);
TChain *chain = analysis->GetChain();

// don't use proof
chain->Process(analysis); // this works

// use proof and chain
chain->SetProof();
chain->Process(analysis); // analysis->SlaveBegin, analysis->Notify and analysis->Process never called

// use proof and dataset
gProof->Process("mcrange0", analysis); // analysis->SlaveBegin, analysis->Notify and analysis->Process never called

What does this produce on the screen?

root[] gProof->Process("mcrange0", analysis);

G

Question: do you have a ClassDef(analysis, 2) in your selector class definition?

G

No, I didn’t!

With that, finally SlaveBegin is called… and then crashes, but at least it’s a progress :slight_smile:

Ok, that’s explains the thing, it was just running an empty TSelector.
Now you have to check in the logs where the crash is.

G

I see that the selector crashes if I try to do chain->GetCurrentTree()->GetLeaf(“var”), while with SetBranchAddress everything works fine.
Another problem is that my Analysis class has an internal state, set in the constructor, which is necessary for the events processing. This state is not shared between the workers, since each of them creates a new instance with the default constructor; I looked in the tutorials and it seems to me that the only way to pass parameters to the workers is via the input list (gProof->AddInput()): is this correct?

Thanks for your help!

Hi,

Good that it works better now.

Not surprising, there is no concept of chain in the workers.

This is strange: you are sending the object and it should be streamed in entirely by the worker (the default constructor is required to do that, in the same way as when reading an object from a file).
Are you saving the state in data members?

The input list is shared between the client, the master and the workers and, yes, it is the suggested way to send parameters and the only one if you process-by-file. But if you process-by-object you should be able to configure your selector locally before sending off for processing; see above.

G

I solved the problem: I was using ClassDef(Analysis,0) and so the streamer was not being created.
Now everything works :smiley:

Thank you very much for your help Ganis!