Cannot write object to file with TProofOutputFile

Dear PROOF experts,

I am using PROOF, root version 5.28.
I experience a singular problem when trying to save histograms using TProofOutputFile.
I have followed the examples in the tutorials and at page:
root.cern.ch/drupal/content/hand … root-files

In the SlaveBegin() method I add the following (ProofFile is a TProofOutputFile* and fout a TFile*, both defined in the private part of the selector):


    ProofFile = new TProofOutputFile( "/users2/bertolucci/newskim.root", ProofOutputFile::kMerge );
    ProofFile->SetOutputFileName( "/users2/bertolucci/newskim.root" );
    TDirectory *savedir = gDirectory;
    fout = ProofFile->OpenFile("RECREATE");
    savedir->cd();
    h_el_n = new TH1F("h_el_n", "Distrib", 15, 0, 15);
    h_el_n->SetDirectory( fout );

Inside the Process() method I fill the histogram, and in the SlaveTerminate():


    TDirectory *savedir = gDirectory;
    savedir->cd();
    h_el_n->Write();
    ProofFile->Print();
    fOutput->Add( ProofFile );
    h_el_n->SetDirectory(0);
    gDirectory = savedir;

In the Terminate() method:


   ProofFile = dynamic_cast<TProofOutputFile*>( fOutput->FindObject( "/users2/bertolucci/newskim.root" ) );
   TString outputFile( ProofFile->GetOutputFileName() );
   fout = TFile::Open( outputFile );
   fout->Write();

Now, reading at the worker logs, I find this:

So I suppose that when calling the Write() method for the h_el_n object it is not able to find the current directory; a file is written, but it is empty. What am I missing?

The second problem arises from the Terminate() method: a get a crash because I cannot retrieve the TProofOutputFile (the FindObject method always return 0, so the program crashes when trying to open the TFile: fout = TFile::Open( outputFile ); ). What is the problem now? I do not understand…
Thank you very much in advance.

Cheers,
Federico

Hi,

You are trying to write the object in gDirectory.
Try

TDirectory *savedir = gDirectory;
fout->cd();
h_el_n->Write();
ProofFile->Print();
fOutput->Add( ProofFile );
h_el_n->SetDirectory(0);
gDirectory = savedir;
fout->Close();

as in the mentioned example.

Once this is fixed, check for errors in the master log related to the output file.
Btw, are you running PROOF-Lite or standard PROOF? Your output file path is a local path; in standard PROOF it will be local to the master machine.

G. Ganis

Hi,
thank you very much.
If I add those lines, all the workers crash.
The workers throw these messages:

I do not understand the TFile error, since it should be opend with the “recreate” flag.
I asked to print out the value of the TFile pointer fout after


fout = ProofFile->OpenFile("RECREATE");

in the SlaveBegin() method. It seems that fout points to null, how is it possible?

Then the crash (again the worker logs: )

the line refers to the “fout->cd();” you suggested to add before writing the object.

for what concern the master log, soon after the settings I read:

so I do not understand whether the problem is in the way I use TProofOutputFile or TFile objects…

//-------------------------

I do not understand this point: the path I use is global, isn’t it?

Thank you again,
Federico

Hi,

You have an error message from TFile::Open

13:45:35 25686 Wrk-0.17 | Error in <TFile::TFile>: file //users2/results/newskim.root already exists

This explains while fout is NULL. You should not get that with RECREATE. But, if I understand correctly, all workers see the make /users2, which means that all try to RECREATE that file … I do not know what happens in such a case.

From the TProofOutputFile::Print outputs it looks like the file name is always the same, i.e. there is no information about the worker number and the file is not in the working directory. This is strange.
Can you post the output of proof->Print(“A”) (perhaps only the part related to one or two workes, e.g. worker 0.13)?
Also, do you have a “ProofServ.DataDir” setting some where, e.g.

 xpd.putrc ProofServ.DataDir /some/dir

in the xproofd config file?

Can you also try removing the absolute path when you create the TProofOutputFile (but keeping when you define the final output file)?

    ProofFile = new TProofOutputFile( "newskim.root", TProofOutputFile::kMerge );
    ProofFile->SetOutputFileName( "/users2/bertolucci/newskim.root" );
    TDirectory *savedir = gDirectory;
    fout = ProofFile->OpenFile("RECREATE");
    savedir->cd();
    h_el_n = new TH1F("h_el_n", "Distrib", 15, 0, 15);
    h_el_n->SetDirectory( fout );

(in this way the temporary worker files are created in the sandbox, unique to each worker).

For the second problem,

How do I know?
Terminate is executed on the client: do the client and the master share the same /users2 ?

G. Ganis

Hi,
here I copy the output of proof->Print(“A”):

I tried to change the path name of the output file, and this is the result in the proof session:

in the worker logs I see:

and in the master one:

This time the output file is created at the right place, but if you run different times the same analysis, you obtain different results (for example, I see different number of entries in my histograms).

this I really do not know, I am not the system administrator, and cannot figure out where the configuration files could be… Any suggestions?
Thank you very much again.

Federico

Hi,

So, I think I made some progress on the problem.

The TProofOutputFile technology was developed for PROOF clusters having local storage served by xrootd. The merging phase in the end assumed the possibility to access the produced files via an xrootd server running on the worker machines.

This, of course, was a mistake, and the fix has been the possibility to set the way to access the file with an environment variable, LOCALDATASERVER, which could be set by the administrator as default, and could be changed by the user via TProof::AddEnvVar . This variable indicates the first part of the URL to access the files, and for files on a distributed file system it should just be set to “file://” .
This worked for setups having all the relevant directories on the distributed file system. In your case, however, the working dirs, where the temporary files are created, are not under /user2 . This unfortunately breaks the thing because of a bug that I have just found in TProofOutputFile.

I will try to fix the bug asap.

However, I would like to propose you to test a workaround, which may work also your case as it did in my setup.

The workaround is the following:

  1. In the selector file, include ‘TProofServ.h’ on the top and change SlaveBegin in this way
    ProofFile = new TProofOutputFile( "newskim.root", TProofOutputFile::kMerge );
    ProofFile->SetOutputFileName( "/users2/bertolucci/newskim.root" );
    TDirectory *savedir = gDirectory;
    // Workaround for the TProofOutputFile bug
    TString fn("/users2/bertolucci/<ord>/newskim.root");
    TProofServ::ResolveKeywords(fn);
    TString dirord = gSystem->DirName(fn);
    if (gSystem->AccessPathName(dirord)) gSystem->mkdir(dirord, kTRUE);
    fout = TFile::Open(fn, "RECREATE");
    if (fout && fout->IsZombie()) SafeDelete(out);
    if (fout) ProofFile->AdoptFile(fout);
    savedir->cd();
    h_el_n = new TH1F("h_el_n", "Distrib", 15, 0, 15);
    h_el_n->SetDirectory( fout );
  1. Set the environment variable LOCALDATASERVER to ‘file://’ before opening the PROOF session:
    root [] TProof::AddEnvVar("LOCALDATASERVER", "file://");
    root [] TProof::Open(<master>)

or in the relevant macro.

Please try and let me know.

G. Ganis

Hi,

thank you so much.
I tried your workarounds, but there are still some problems.
I cannot repeat exactly the exercise you suggested, since I have to compile the selector; so I add a single line:

The result is that the file is created, but the session crashes at some random event number with the following strange error:

and always at the same code line, which refers to the fout->cd(); calls in the SlaveBegin() method; I commented it, then the system crashed as before, but at the call fout->Close(); in the SlaveBegin().
If I ask for a printout at these points, I see that again fout points to NULL, so maybe this is the crash reason, but do not realize why it should be zero, since the file has been created.

I also have tried to comment these two lines out; the result is no-crash, but the objects are not written to file:

and in the worker logs:

as we should expect (if I understand it a bit…).

Thank you again,
Federico

Hi,

The global ‘gSystem’ is unique and already created by the system. Just include ‘TSystem.h’ (I forgot this, sorry).
Please repeat with this change.
I have no idea what could happen when you overwrite gSystem with new TSystem() … I am surprised that there is no global crash …

G

Hi,

thank you, now it works!!!
But also, now it creates 0.x dirs in the directory where the merged file is created.
So, could you please explain me what are you doing with the added lines in the SlaveBegin()?

Thank you again,
Federico

Hi,

Good that works now.

Yes. But this is needed because you are writing in a shared system and each worker needs a unique area. You can perhaps hide them behind a ‘.tmp/’ directory …

Those lines are needed to control the path where the files with the output of each worker are created. This is what TProofOutputFile should do but it does not in your case because of the bug.
So, first we create a string with the path. The ‘’ is a place-holder for the worker ordinal number (the 0.0, 0.1, …) which is then correctly resolved by the call to TProofServ::ResolveKeywords(fn).
Then we have to make sure that the directory exists, otherwise TFile::Open will fail. Since we know that ‘/users2/bertolucci’ exists, we just need to check if ‘/users2/bertolucci/’ exists: if not we create it.
Finally we open the file with TFile::Open and tell the TProofOutputFIle object to adopt this file. In this way we avoid the bug in the TProofOutputFIle constructor.

G. Ganis

Ok,

thank you very much.
Cheers,
federico