Help with OpenMP and Multidimensional Integration

Pomeron · January 24, 2012, 8:50pm

Dear All,

I need to fill 2Dim array while each cell of the array gets an integral (of some function) as a function of free parameters. Since I deal with a very large arrays (here I simplified my case), I need to use OpenMP parallelism in order to speed my calculations up. Here I use simple #pragma omp parallel for directive.

Without using “#pragma omp parallel for”, the code executes perfectly. But adding the parallel directive, produces race conditions in the output.

I tried to cure it by making “private(i,j,par)”, it did not help. It seems like the problem is in the integrator itself. Each thread has its own copies of i,j and par, so there is no reason for the race condition.

I am totally frustrated with this problem and cannot find the way out.

Please, help!

P.S. I use VS2008 Professional with OpenMP 2.0 and ROOT 5.26 under WIndows 7 OS

[size=85]

testfunc(const double* var, const double* par)
{
// here is some simple function to be integrated over var[0] and var[1] and two free parameters par[0] and par[1]

return …

}

static double tmp[100][100];

int main()
{

double par[2];
double a[]={0,0} // limits of 2D integration
double b[]={1,1};// limits of 2D integration

ROOT::Math::AdaptiveIntegratorMultiDim ig; // creation of the multidim integrator
ROOT::Math::WrappedParamFunction<> f1(&testfunc, 2, 2); // wrapping of the function in order to pass it to the integrator

ig.SetFunction(f1); // setting the function

int i,j;
#pragma omp parallel for private(i,j,par)
for (i=0;i<100;i++)
{
for (j=0;j<100;j++)
{
par[0] = i;
par[1] = j*j;

			f1.SetParameters(par);  // each cell of "tmp" gets integral of "testfunc" with different values of "par"
			tmp[i][j]	=	ig.Integral(a,b); // filling the cell with the value
		}
	}

}
[/size]

moneta · January 25, 2012, 8:35am

Hi,
The problem is that the function object is common between the threads and contains in its state the parameter values (i and j^2). So when you are integrating in one thread (i.e. when calling the Integrate method), you are probably using a function state defined by another thread.
The solution is to define different function objects for each thread

Best Regards
Lorenzo

Pomeron · January 25, 2012, 3:49pm

Thank you Lorenzo,

I did as you said: I defined two (since I have only two threads) distinct function objects and two integrator objects (just to be 100% sure I have no source for race conditions).

I have two nested “for” loops (with lengths Nx and Ny). So, I want that the first thread (rank=0) will calculate a half of Nx (outer) loop and the second thread (rank = 1) will do the rest. By using “if” clause I can do it explicitly. Since each of the threads should perform a half of the total work, I expect it will speed up the calculations in two times (approximately).

Now, the modified code is:

[size=85]

// here I define the first integrator and the first function object
ROOT::Math::AdaptiveIntegratorMultiDim ig1;
ROOT::Math::WrappedParamFunction<> f1(&testfunc, 2, 4);
ig1.SetFunction(f1);

// here I define the second integrator and the second function object
ROOT::Math::AdaptiveIntegratorMultiDim ig2;
ROOT::Math::WrappedParamFunction<> f2(&testfunc, 2, 4);
ig2.SetFunction(f2);

int i,j,tid;
double a[]={0,0};
double b[]={1,1};
double par[4];

#pragma omp parallel private(i,j,tid,par)
{
tid = omp_get_thread_num(); // each thread gets its personal ID

	if (tid==0)  // master thread (ID=0) should perform a half of the work (0<i<Nx/2), (Nx mod 2=0)
	{
	for ( i=0;i<Nx/2;i++)
	{
	       	for ( j=0;j<Ny;j++)
		{
			par[0]=LogGrid(i);
			par[1]=LinGrid(j);
			f1.SetParameters(par);
			rrr[i][j] = ig1.Integral(a,b);
		}
	}
	}

//***************************************************************
if (tid==1) // the second thread
{
for (i=Nx/2;i<Nx;i++)
{
for (j=0;j<Ny;j++)
{
par[0]=LogGrid(i);
par[1]=LinGrid(j);
f2.SetParameters(par);
rrr[i][j] = ig2.Integral(a,b);
}
}
}

}
[/size]

Now, it executes perfectly (without race conditions), BUT, there is NO speed up in performance!

Two threads do this work the same time as one thread does. Probably the time is wasted on communication between the threads or something else.

If you have any idea, please help. I am stuck on this point for a very long time and it is essential for my project.

Thank you in advance,

A.K.

moneta · January 25, 2012, 4:13pm

Hi,

Maybe splitting the threads is expensive. What I would try is to make private the function object and make them in the loop. If the function is expensive to compute, it should not add much overhead.
If still this does not produce speed-up, the problem needs to be understood better and we need probably to modify the Integrator class and make it more thread-friendly.
What is your typical time you need to perform the integral ? Is it very fast ?

Best Regards

Lorenzo

Pomeron · January 25, 2012, 4:46pm

Dear Lorenzo,

I will try to do it.

The single (at fixed values of “i” and “j”) integral evaluation is pretty fast (1e-3 sec), but the main problem is that I have “a lot” of values of “i” and “j”.

I have a 4D matrix, with sizes M[100][100][100][100] ( or M[1e8]) and each cell of the matrix should be filled with the this integral. A singe evaluation (at each cell) is fast (1e-3), but since I need to repeat this evaluation 1e8 times, I get that a single-full-matrix-fill-run will take around 1e5 seconds ~ 27 hours. I need to calculate at least 100 such matrices -> ~3months. (impossible)

There is a straightforward solution to this problem: to parallelize the loop and instead of 1e8 serial run, reduce it to 1e8/(#threads) on a cluster, which significantly simplifies the entire picture.

Thus, perfect parallelism will save me. But I am stuck at this point

Hope your advice will help.

Best regards,

P.

Pomeron · January 25, 2012, 9:43pm

Dear Lorenzo,

It seems like something in my code is not thread-safe. May be you can guide me how to check and to spot the problem with the integrator or function object.

Thanks!

P.

Pomeron · January 25, 2012, 10:41pm

Dear Lorenzo,

This problem was magically solved by defining the parameters of integration as global and using threadprivate directive:

double parGlob[4];
#pragma omp threadprivate(parGlob)

Of course having global parameters is not safe, but at least, now I can utilize parallel computing to speed up my calculations.

Thank you for your suggestions and help,

P.

moneta · January 26, 2012, 1:08pm

Hi,

Which parameter exactly you need to define as global, the xmin/xmax values you pass to Integral() ?
I don’t see this parameter in your snipped code.
What you have posted is an interesting example and I would like to understand this case better,

Thank you

Lorenzo