OpenMP for TFile

KevinWang · October 25, 2019, 3:29am

ROOT Version: 6.18/00
Platform: CentOS 7
Compiler: GCC

Hello Rooters,

I met some issues that I cannot understand when I used OpenMP in for loop to create some new TFiles.

My script likes these:

#include "stdio.h"
using namespace std;
#include "TFile.h"
#include "TTree.h"

int main()
{
        #pragma omp parallel for
        for(int i=1; i<=10; i++)
	{		
		printf("i=%d\n", i);
		TFile *fAna=new TFile(("./testOP_"+to_string(i)+".root").c_str(), "RECREATE");
		TTree *tAna=new TTree("tAna", "tree for data analysis");
		tAna->Write();
		delete tAna;
		delete fAna;
	}
	return 0;
}

I complie it like this g++ -o TestOpenMP TestOpenMP.cpp -fopenmp root-config --cflags --libs
Is there anyone helping me on these issue?
Thank you!

Kevin

marty1885 · October 25, 2019, 6:17am

If my experiences are correct, the low-level TTree interface relies on global states and thus is now thread safe.

I might be wrong tho.

KevinWang · October 25, 2019, 1:44pm

Hi,

Thank you. I also tried removing the TTree stuffs, but it also gave me error message. Did this means both TFile and TTree are not thread safe? Do you know how to make them safe?

Kevin

marty1885 · October 25, 2019, 3:56pm

Yes, both TTree and TFile are not thread safe. And I suggest not to parallelize a file writing task even if they are thread safe. Writing to files in parallel will cause random writes to disk and lead to bad performance.

eguiraud · October 25, 2019, 4:02pm

Hi,
TFile, TTree, the graphics and a few more things rely on global state, and therefore are not thread-safe by default.

You can call ROOT::EnableThreadSafety to make certain things safe at the cost of taking some locks: https://root.cern.ch/doc/master/namespaceROOT.html#a3332c2f629881ab608768fa6846f440e

Feel free to ask in case you need more information than what’s provided at the link above.
Cheers,
Enrico

KevinWang · October 25, 2019, 6:32pm

Hi eguiraud,

The reason why I tried to use OpenMP for multi-processing my data is that I want to speed up the analysis work since there are many runs of experimental data files to be analyzed. And each experimental data file needs to be converted to a root file. So I think multi-process the for loop will be faster than the traditional processing. Do you have any idea about this?

Thank you!
Kevin

marty1885 · October 26, 2019, 2:37am

You might want to use PROOF. If you could afford doing the analysis in parallel but writing the data in serial. You use the RDataFrame interface, which is thread safe (This should be faster than PROOF if you are not working with >100GB):

ROOT::EnableImplicitMT(); // Enable implicit parallel computing
auto rdf = ROOT::RDataFrame("ntuple", "data/*.root");

// Do your analysis here. For example:
auto h1 = rdf.Histo1D("eV");
auto fAna = make_shared<TFile>("plot1.root", "RECREATE");
auto TTree *tAna= make_shared<TTree>("tAna", "tree for data analysis");
tAna->Fill();
fAna->Write();

// Another analysis
auto h1 = rdf.Histo1D("var2");
etc...

eguiraud · October 28, 2019, 9:06am

Yes, concurrent processing should be faster than single-core processing, even if some locks are taken sometimes. Scaling should be fine as long as you spend most of the computing resources on logic that is trivially parallel.

Of course, you should measure for your usecase.

Cheers,
Enrico

system · November 11, 2019, 9:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.