PROOFLite without TSelector

ccorti · March 30, 2015, 9:22am

Dear experts,

I am trying to understand how to integrate PROOFLite in my analysis. The problem is that I have already a class wich does most of the work and internally process a different number of TChain and converting this class to a TSelector is a bit complicated.
The TChain processing is done in a function called FillEvents(); this function is called by another class on different TChains and basically I would like to parallelize the following loop using TProofLite:

for (UShort_t idataset = 0; idataset < ndatasets; ++idataset)
{
   analysis[idataset]->FillEvents();
}

Is there a way?

ganis · March 30, 2015, 9:33am

Hi,

The short answer will be ‘no’, because you seem to control by your self the event loop inside FillEvents.

However, you can probably parallelize on files, with little modification. What is ‘analysis’ in your example?

G Ganis

ccorti · March 30, 2015, 9:45am

A class (not derived from any root class) in which many histograms are defined and filled, depending on which TChain is passed in the constructor.
The class is not mine, so I would like to change it as less as possible.
I could quite easily create a macro to do analysis->FillEvents() and run the macro manually in parallel, but this would complicate my workflow and I would prefer to keep everything inside a single program instead of dividing it in different ones.

ganis · March 30, 2015, 11:12am

Hello,

How is TChain used inside FillEvents?
I mean, are Process or Draw called, or is there a ‘manual’ for loop over entries?

Anyhow:

Without a modification of the code is impossible to parallelize, even if you go for a threaded solution.
To get a better idea if you can do something you need to provide more information about the code and the workflow; an ad hoc solution requires looking at the code.

G Ganis

ccorti · March 30, 2015, 12:02pm

There is a manual for loop in FillEvents(), but I don’t want to parallelize this function, I would be happy with running FillEvents with each dataset on its own thread.

My program is like a fitting procedure:

Run FillEvents on different datasets of MC files; the processing of MC events depend on few parameters
Merge the produced histograms in one single root file
Compare these histograms with data
Adjust the parameters and repeat from 1) until the convergence conditions are fulfilled

I wrote a class to do all these steps automatically and the bottleneck is of course the processing of the different MC datasets: if I could process all the datasets at the same time (now they are processed sequentially, in the loop I wrote in the first post) this would save a lot of time.
I am not interested for now in parallelizing the processing of a single dataset.

ccorti · March 31, 2015, 10:06am

I guess I’ll have to implement the TSelector in my code.
I have a few doubts though.

If my class derives from TSelector and I do

MyClass *class = new MyClass(/*arguments*/);
proof->Process(class);

is the istance I created being shared in the different workers or each worker initializes its own instance of MyClass?

Each event in my TChain contains detector informations that varies each second and I need to fill some histograms with this informations once per second. If I control the event loop I can do

// before the loop
int current_time = 0;
int time;
chain->SetBranchAddress("time", &time);

// inside the event loop
if (time != current_time)
{
   current_time = time;
   // fill some histograms with detector informations
}

It looks to me that this approach can not work in PROOF, because the condition time != current_time could be fullfilled for the same second in different workers. Is this right?