BookMVA in parallel jobs

Dear Rooters,

In order to apply the TMVA classification to a dataset, I’m booking once the MVA method (BookMVA) and then looping over my input files (more than 1000) to evaluate the MVA classifiers for each event, using TMVAClassificationApplication.
To speed up this process, I tried to launch one job per file or per 10 files let say (max jobs running in parallel for me = 200), which means a duplication of weigthing files (ok) and independant “parallel” booking of the MVA method.

Somehow trying to have 5 or 10 jobs in parallel makes the TMVAClassificationApplication routine stuck at the booking stage (a single job works as expected).

Do you have any experience with TMVA and job submission?
Is this a fundamental issue with TMVA or a simple I/O issue?

More info:
Running 10 parallel training works properly
ROOT version: 6.05.02
TMVA version 4.2.1

Best regards,
Yoann

Hi,

This is dependent on how you do the parallelisation. If you for example run each job as an independent process there should be no issue as long as you don’t write everything to the same outputfile. E.g.

root -l 'RunMyAnalysis.C("inputfile1", "outputfile1")' &
root -l 'RunMyAnalysis.C("inputfile2", "outputfile2")' &

If you try to parallelise within the same process you can run into problems. Please expand on your particular setup if this does not answer your question. (What exactly do you mean with a job here for example.)

Cheers,
Kim

1 Like

Hi Kim,

Thank you for your answer.
Then it must come from an I/O issue from my side.

Cheers,
Yoann

Please let us know how you resolve your situation in the end :slight_smile:

Cheers,
Kim