Parallel computation with multi cores

boaca926 · April 27, 2016, 9:54pm

Hi

Currently I am doing my analysis on a set of root files in sequence. Since I have a multiple core laptop which I would like to distribute the works on cores instead of looping over the list of files one by one. I assume there must be a way of doing it and seeking for some help.
Here I attach a simplified version of my analysis stored in folder I call it “paralleltry” which includes following files:

MyClass.C (file which includes main function to execute)
MyClass.h (the header file)
Analys.C (read input files)
mckslpath (text file which includs path to all root file)
8 root files (data)
What I do in terminal as follows

.L MyClass.C+
.L Analys.C+
Analys(“mckslpath”,”mcksl”)
The output of above commands is a root file mcksl.root for further use of anaylsis while the speed of light will be printed on screen after all files are anaylized which just indicate the program does the work properly.
Would some one help me with modification of some of these files, I assume it is the file Analys.C mainly should be changed in order to make parallel computation to work.
Notification: I am using ROOT version 5.34/21
Thanks in advance!
Analys.C (1.97 KB)

boaca926 · April 27, 2016, 9:58pm

Just found out the other files are too large to load…
MyClass.C (576 Bytes)
MyClass.h (22.9 KB)

pcanal · April 28, 2016, 4:19pm

Hi,

You ought to look into ProofLite (root.cern.ch/proof-multicore-de … proof-lite). You will need to generate your code skeleton using MakeSelector instead of MakeClass.

Cheeers,
Philippe.

boaca926 · April 29, 2016, 5:08pm

Thanks

Now I’ve created MySelector and modified Analys.C as follows

[code]#include <TChain.h>
#include <TTree.h>
#include <TString.h>
#include <TTimeStamp.h>
#include <TEnv.h>
#include <TCanvas.h>
#include <TProofLite.h>

#include
#include
#include
#include

using namespace std;

#include “MySelector.h”

int Analys(TString list_of_files, TString naming) {

// char *list_of_files = new char[strlen(argv[2]) + 1];
// strcpy(list_of_files, argv[2]);

TChain *tree = new TChain(“ETAPPG/h1”);
// reading list of ROOT files from the file list
// files can be commented with ‘!’
string line;
ifstream filelist(list_of_files);
// ifstream filelist(“filelist.txt”);
if (filelist.is_open()) {
while (!filelist.eof()) {
if (getline(filelist, line, ‘\n’))
if (line[0] != ‘!’) {
tree->Add(line.data());
cout << “Adding file: " << line << " to the chain of files” << endl;
}
}
filelist.close();
} else {
cout << “Unable to open filelist” << endl;
return 0;
}

// //MC del, kommentera bort för att köra data
// //open histrogram file here

TFile *myfile;
myfile = new TFile(naming+".root",“RECREATE”);

TProof *proof = TProof::Open("");
proof->Load(“MySelector.C+”);
MySelector *analysis = new MySelector();
tree->SetProof(proof);
tree->Process((TSelector *)analysis);

//saving and closing histogram file
myfile->Write();
myfile->Close();

delete analysis;

return 0;
}[/code]
and I get some output looks like

[code]root [3] .L MySelector.C+
Info in : unmodified script has already been compiled and loaded
*** Interpreter error recovered ***
root [4] .L Analys.C+
Info in : unmodified script has already been compiled and loaded
root [5] Analys(“mckslpath”,“mcksl”);
Adding file: ./data/mcksl30300.root to the chain of files
Adding file: ./data/mcksl30301.root to the chain of files
Adding file: ./data/mcksl30302.root to the chain of files
Adding file: ./data/mcksl30303.root to the chain of files
Adding file: ./data/mcksl30304.root to the chain of files
Adding file: ./data/mcksl30313.root to the chain of files
Adding file: ./data/mcksl30314.root to the chain of files
Adding file: ./data/mcksl30315.root to the chain of files
Info in : unmodified script has already been compiled and loaded
19:04:04 12021 Wrk-0.3 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12021 Wrk-0.3 | Info in : unmodified script has already been compiled and loaded
19:04:04 12019 Wrk-0.2 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12019 Wrk-0.2 | Info in : unmodified script has already been compiled and loaded
19:04:04 12017 Wrk-0.1 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12017 Wrk-0.1 | Info in : unmodified script has already been compiled and loaded
19:04:04 12027 Wrk-0.6 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12027 Wrk-0.6 | Info in : unmodified script has already been compiled and loaded
19:04:04 12023 Wrk-0.4 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12023 Wrk-0.4 | Info in : unmodified script has already been compiled and loaded
19:04:04 12029 Wrk-0.7 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12029 Wrk-0.7 | Info in : unmodified script has already been compiled and loaded
19:04:04 12025 Wrk-0.5 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12025 Wrk-0.5 | Info in : unmodified script has already been compiled and loaded
19:04:04 12015 Wrk-0.0 | Info in TProofServLite::HandleCache: loading macro MySelector.C+ …
19:04:04 12015 Wrk-0.0 | Info in : unmodified script has already been compiled and loaded

Info in TProofLite::SetQueryRunning: starting query: 2
Info in TProofQueryResult::SetRunning: nwrks: 8
Looking up for exact location of files: OK (8 files)
Looking up for exact location of files: OK (8 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 8
Validating files: OK (8 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 0.000000
Lite-0: all output objects have been merged[/code]
It seems worked, however, In Myselector.C I defined function Main() which suppose to do the analysis doesn’t print out the speed of light and total number of entries as I wanted, why? and how should I do then.
MySelector.C (2.5 KB)

pcanal · April 29, 2016, 6:19pm

In normal operation, you would call TChain::Process and pass your selector (or the name of the selector’ source file) and it would call the various functions defined in the TSelector interface at the intended time (See for example the documentation in the file you uploaded) … So indeed it will not call directly any function you may add unless you call them from the interface functions (like Process or Terminate)

Cheers,
Philippe.

ganis · May 2, 2016, 2:57pm

Dear boaca926,

The event loop is handle by PROOF. So you have to move what you do inside the ‘for’ loop in Main into MySelector::Process

Bool_t MySelector::Process(Long64_t entry)
{
   // The Process() function is called for each entry in the tree (or possibly
   // keyed object in the case of PROOF) to be processed. The entry argument
   // specifies which entry in the currently loaded tree is to be processed.
   // It can be passed to either MySelector::GetEntry() or TBranch::GetEntry()
   // to read either all or the required parts of the data. When processing
   // keyed objects with PROOF, the object is already loaded and is available
   // via the fObject pointer.
   //
   // This function should contain the "body" of the analysis. It can contain
   // simple or elaborate selection criteria, run algorithms on the data
   // of the event and typically fill histograms.
   //
   // The processing can be stopped by calling Abort().
   //
   // Use fStatus to set the return value of TTree::Process().
   //
   // The return value is currently not used.
 
   nb = fChain->GetEntry(entry);   nbytes += nb;
      // if (Cut(entry) < 0) return;
         
   totalev++;
   Printf("total number of events = %d \n",totalev);


   return kTRUE;
}

and, if you want to have something printed in the end, you may move the printout of the ‘speed of light’ to MySelector::Terminate

void MySelector::Terminate()
{
   // The Terminate() function is the last function to be called during
   // a query. It always runs on the client, it can be used to present
   // the results graphically or save the results to file.

   printf("speed of light = %lf\n",speedc);
}

(assuming that all the used variables, e.g. speedc or totalev, are somehow defined).
(Of course the above code would be very verbose, since it will print the number of events per each event, which is probably not required).

The idea is that in Process, which is called by the system, you read the entry and you analyse it, creating the output that you need.

G Ganis