Easiest way to go parallel? - TThread & PROOF

nbubis · August 24, 2009, 11:10am

Hi all,

I’m trying to make my analysis code run on a multicore machine (specifically a new 2xquad at my Uni), and I’m trying to understand what the easiest way of doing this is. The time consuming part of my code mostly involves reading large trees (not chained) into histograms from tens of files (one tree per file). I’ve read the tutorials and example code and am still a bit confused.

Is there a big advantage of using PROOF over opening multiple TThreads? which is easiest to modify existing code to?
Is there any way for functions passed to TThread to return values (aside from using globals)? Is there a way to pass multiple arguments (without using dedicated structs)?
Can I use PROOF for a single TTree? that is, do gain any speed if each chain only contains one tree?

Any help would be greatly appreciated, I’m a bit of a newbie at parallel.
Nati.

ardashev · August 24, 2009, 7:03pm

Easiest way w/o learning anything about parallel coding is to process each tree-file by a separate program-process w/o any threads and write resulting histograms into separate files. And then, at the end, have these histograms summed up by available in ROOT “hadd”. Of course histograms have to be identically produced by instances of the same program.

nbubis · August 24, 2009, 9:05pm

ardashev,

Thanks for the reply. That’s more or less what I’m doind now, but since I have to run this many times over (for different parameter sets) it’s not very efficient.

I’ve tried using TThreads (with the code below) but keep getting seg faults. Any ideas as to why?

[code]
void *process(void *ptr)

{

…
TFile *f = new TFile(filename);
TTree tree = (TTree)f->Get(“T”);
TH1F *hist = new TH1F(“hist”,“blah blah”,200,0.0,4.0);
int N = tree->GetEntries();
tree->SetBranchAddress(“R.gold.p”,&p);

for (int i=0; i < N; i++) {

  tree->GetEvent(i);                  
  hist->Fill(p);

}

gSystem->Sleep(10);

return 0;

}[/code]

ardashev · August 24, 2009, 9:19pm

Well, there are so many ways you can go wrong with threads…

in your code I assume you fill same histogram from different threads ?

Last time I looked histograms where not thread safe. One has to put locks and unlocks every time one accesses a histogram.

Btw, have you timed execution of your program with filling histograms and without?

Most likely you are IO-bound and there is not need for you to worry about threads

ganis · August 24, 2009, 10:08pm

Dear Nati,

The advantage of PROOF is that you do not have to bother about tread-safety and load-balancing of the entry loop.
The TThread technique may be more efficient, if you get it right.

This depends on the starting point.
PROOF is designed to work in the TSelector framework, so you need the adaptation if your current code does not yet make use of TSelector.
Depending on your experience with threads, writing a TSelector may be anyhow easier than thread programming.

Well, passing a ‘void *’ give you all the flexibility you need. In particular you can reserve regions dedicated to each thread.

PROOF always runs on a single TTree. But I guess you meant a TTree residing on one file only. Yes, just create a TChain with one file and you are done. For what relates to the gain, if you are CPU bounded you’ll gain . If you are I/O bounded it will depend on your I/O hardware.

Looking at your example, it shouldn’t be very difficult to write a TSelector for doing what you need. You should load your TTree once and then obtain the TSelector template for your tree with TTree::MakeSelector; then you have just to fill in the hist definition in SlaveBegin and the filling in Process. Have a look at, for example, tutorials/tree/h1analysis.h,.C
Once you have your TSelector, you can try PROOF straightaway and get quickly an idea of what you can gain from parallelization.

G. Ganis