Cycle driven processing using fortran package

Chinmay · June 3, 2015, 8:36am

Hi.
I am new to PROOF use. I am using Extensive Air Shower simulation package CORSIKA which is written in fortran. How can I fire many simulation jobs, in parallel, on cluster, using fortran executable of CORSIKA in PROOF ?

ganis · June 4, 2015, 3:42pm

Hi,

PROOF is designed to work with ROOT TTrees and it is written in C++.
With some tricks, it can be used as a scheduler of tasks, possibly externals, and then collect the results of these tasks.
To understand it this could make sense and how difficult it could be, you should explain a bit more how you usually run your code, in terms of input parameters and output handling.

G. Ganis

Chinmay · December 22, 2016, 11:36am

Hi,
Back to this problem.
So I have this Fortran code with executable named corsika. Now we provide a text input file to this executable. According to inputs it simulates many events of particle air showers and stores them to binary file. (Actually number of events to be generated is given as an input parameter and I do not have event level control on working of this code.) We generate many input files using shell script and then execute this code on input files using shell scipt (pseudo)-parallely.
To use ROOT/Proof, I have written a code to convert output binary file of corsika to ‘.root’ format (by storing event data in TTree format) and beyond this point I know (at least have an idea) about how to use Proof on my .root files.
My problem is

How to do binary to .root conversion parallel ?
If it is not possible then how to properly distribute the .root files on workers for further analysis with Proof ? Also How can I register these .root files as Datasets ?

ganis · January 11, 2017, 5:37pm

Dear Chinmay,

Sorry for the late reply, due mostly t the Xmas break.

[quote=“Chinmay”]1. How to do binary to .root conversion parallel ?
2. If it is not possible then how to properly distribute the .root files on workers for further analysis with Proof ? Also How can I register these .root files as Datasets ?[/quote]
I will in short send you an example for both cases.

G Ganis

Chinmay · January 27, 2017, 5:55am

Hi,

So now I have managed to do the 1st task using the system() call .
What I am doing is, generating the input card for the corsika in each TSelctor::Process(entry) call. I am passing this card to the corsika using system("./corsika < input_card_generated_for_current_entry"). Corsika provides a option to pipe the corsika output to an executable (which is uploaded on cluster using PAR archive by me). By using this option I am managing to directely have the corsika output in .root format. Is this scheme okay or it may be thread unsafe ? My local runs at least seem to be working fine.
So by following above scheme I am able to have .root format files generated on cluster. How do I club all these files in single dataset. I was hoping to get hint from getCollection.C file, in “Working with datasets” section of PROOF online documentation. But that link is dead. I can’t find this file in tutorials either

ganis · February 2, 2017, 1:29pm

Dear Chinmay,

Sorry for the late reply.
Using system(), actually gSystem->Exec(), is what I intented to suggest.
PROOF is multiprocess, so, as far as the processes (workers/slaves) are not writing ti the same file, I do not see problems with your way of doing. Do I understand correctly that you have an output file per event, i.e. per call to TSelector::Process()?

Sorry for the broken link: I will remove it; however, getCollection.C would have not give you much help, I am afraid.
The tutorial ‘dataset’ in tutorials/proof/runProof.C shows how to generate automatically a dataset out of the files you generate. I have extracted the essential parts into the macro attached (this is the way the tutorials will be provided in the future). The selector used is tutorials/proof/ProofNtuple.h,.C . It shows out to create TProofOutputFiles that automatically create a dataset. In your case you gave probably to create the TProofOutputFile in SlaveTerminate, because they are created in Process.

This said, if you have a list of the files you can create a TFileCollection (which is the way ROOT describes datasets) by putting them into a text file, for example mydatasetfiles.txt, one per line

$ cat mydatasetfiles.txt
# These are the files in my dataset
/path/to/file/one/f1.root
# The file path can be a url
root://some.serv.er//path/to/file/two/f2.root

and do

TFileCollection *fc = new TFileCollection("MyDataset", "", "mydatasetfiles.txt")

Then you can use it in PROOF:

proof->Process(fc, "MySelector.C+", ...)

G Ganis
prf007_dataset.C (4.89 KB)

Chinmay · February 13, 2017, 5:15pm

Hi,
Thanks for the reply and the example code .
I have small confusion though !!..
I actually tried the ProofNtuple.C selector for my problem. I am not understanding
how to (and where to) use TProofOutputFile::OpenFile for creation of dataset.
I opened file in the Begin(), and then it created one file per worker. This way it only saves
events (and corresponding file) those were created in last Process() call on the worker.
Thus when i generated 6 files each with 10 events, it eventually only saved 40 events and
created dataset that consisted of only 40 events and 4 files. If I call OpenFile() in Terminate()
can you please explain in little detail how to do it ?