I have an analysis to do with Proof. I am trying to understand the basic I/O of Proof. From a file “in.root” with a tree “tree_in” ; I want to create a new tree “tree_out” in a new file “out.root”
I want to precise that for the moment, I’am using Proof-Lite but I would like in the next weeks to use Proof On Demand with IBM LoadLeveler batch scheduler… with lots of nodes and no hard drives on the node (i mean each node have access to the same harddrive). Is the TProofOutputFile object still mandatory as in root/tutorials/ProofNtuple.C ?
So my problem is I dont know where to create a new Branch ??
class MySelector : public TSelector {
public :
TFile *fFile;
TProofOutputFile *fProofFile;
TTree *tree_out;
TTree *tree_in;
TBranch *b_in;
TBranch *b_out;
...
I was not able to adapt root/tutorials/ProofNtuple.C … I can read in my tree_in but I dont know where to put the
tree_out = new TTree("mytree","my tree more informations...") ;
tree_out->Branch("mynewobject",&mynewobject,bsize,split) ;
It’s not working if I put this it in SlaveBegin as it’s done with an Ntuple in ProofNtuple
// Now we create the ntuple
fNtp = new TNtuple("ntuple","Demo ntuple","px:py:pz:random:i");
I’ll post a “complete basic not-working example” if I’m not understandable…
For the moment, I can read a tree where each entry is a TClonesArray of TVectorD and write a tree where each entry is a TVectorD… But I have some problem to write in a tree where each entry is also a TClonesArray…
Indeed with
class MySelector : public TSelector {
public :
TFile *fFile;
TProofOutputFile *fProofFile;
TTree *newtree;
TTree *tree;
TBranch *b_vec ;
TBranch *b_newvec ;
TClonesArray *vec ;
TClonesArray *newvec ;
...
I can now process a ttree with TClonesArray and it gives me an other ttree with TClonesArray …
(TFile in.root / TTree tree / TBranch vec => TFile out.root / TTree newtree / TBranch newvec)
I dit it with :
void MySelector::SlaveBegin(TTree /*tree*/) {
...
newtree = new TTree(treeout,"treeout blabla");
newvec = new TClonesArray("TVectorD") ;
...
}
Bool_t MySelector::Process(Long64_t entry) {
if (!newtree) return kTRUE;
vec = 0 ;
// newvec = 0 ;
if (tree) {
Long64_t ent = entry % tree->GetEntries();
tree->SetBranchAddress(branchin,&vec,&b_vec) ;
b_vec->GetEntry(ent);
} else {
Abort("no way to get entries in the input tree... Stop processing", kAbortProcess);
return kTRUE;
}
RTensorT *pt = (RTensorT *) vec->At(0) ;
new((*newvec)[0]) RTensorT(*pt) ;
newtree->SetBranchAddress(branchout,&newvec,&b_newvec) ;
newtree->Fill() ;
// delete vec ;
// delete newvec ;
return kTRUE;
}
and the rest is the same as the ProofNtuple.C with one tree replacing one Ntuple…
So now my problem is the following : there are in fact different branches in the input tree … and with different number of entries… Is it possible to merge only Branches with TProof and not the entire tree without creating new trees with the problematic branches ?
b_newvec->Fill()
instead of
newtree->Fill()
crashed
And the old problem remains : what’s the best way to use TProof with Proof on Demand with a big cluster with a unique hardrive (and not geographically disjoined clusters with their own harddrives) => do I have to use TProofOutputFile ? Do I have to use PAR packages ?
Hi,
Sorry for the late reply.
First, your question:
PAR packages and TProofOutputFile have two different purposes, and only the second has to do with outputs, which is your problem.
TProofOutputFile was introduced to handle the case of big outputs, potentially creating memory issues.
So the first thing to understand is how big your output trees are going to be. If they can grow (very) large then you need a file support, which can be the common harddrive (distributed file system) that your workers nodes seem to have.
For the problems with the code, you wrote that you would post the full non-working example. Please do it, so I understand better what you are trying to do and I can try to propose a solution, having in mind your target setup.
My output with TProofOutputFile is working now so I can keep that solution even if it’s maybe not the optimal solution for me… A typical output is for the moment 10Gb but it could reach 100Gb in a next future …and each node of the cluster is 2 * 8 cores and 32 Gb of RAM
I was asking for the PAR packages because I don’t know how to provide easyly to all my worker nodes my compiled shared library directly from the common hardrive (I don’t want to rebuild something as all my client worker nodes have exactly the same architecture as the master )
and it’s too complicated for me to understand in which order I have to provide all the *.cc *.h source files (with interdependance…)
Sorry, I misunderstood you, I thought you wanted to use the PAR file for the output file.
Yes, to load the required libraries and/or setup include paths you can use the SETUP.C macro of a dedicated PAR file. You can leave the rest of the PAR file empty, and only SETUP.C will be executed when calling EnablePackage.
You can even pass an argument to EnablePackage, a string or a list of objects. These will be passed to the SETUP function.
You have many things to load that’s definitely the best way to do it.
This should be the same order as you would do it in a plain ROOT session.