Best method to generate a large number of plots (in parallel)?

JMolson · March 23, 2016, 11:59am

Hello,

I’m trying to plot many thousands of images using the same large data set. It is actually an animation of particle tracks, so each frame needs the data used to generate the previous frames (and thus I cannot just load subsets of the main data set and just use separate processes - disk i/o is slow, and RAM is limited). This is a time consuming process, but I do have lots of cpu cores on a single machine to use.

I’m attaching an example of what I do below. My question really is what is the best way to plot images in parallel (if possible). If I do not have TThread::Lock() calls in, of course the plotting code crashes. Are any of the plotting functions thread safe?

Will any future root releases have thread safe plotting?

Fedora 23, gcc 5.3.1, root 6.06/02

Build as:
g++ Threads.cpp root-config --cflags --glibs -lpthread -std=c++11 -o Threads
Threads.cpp (2.57 KB)

Thanks for any help!

bellenot · March 23, 2016, 1:22pm

Hi,

There are a few ROOT tutorials showing how to use threads in the $(ROOTSYS)/tutorials/thread/ directory.
Hope this will help.

Cheers, Bertrand.

Danilo · March 23, 2016, 1:41pm

Hi,

thanks for submitting a self contained reproducer: this helps a lot.
I cannot but invite you to follow the link which was posted: this is the resource we are maintaining to illustrate how to express parallelism with ROOT.
Meanwhile I studied your code. I acknowledge some thread safety issues in ROOT, e.g. in TAsiImage: we’ll work on those asap. While I am working on a multiprocess based example, I attach a new version of your code with a more fine grained locking, perhaps this helps you to scale.

I had to face a similar problem in the past, i.e. high throughput generation of images in the context of data quality monitoring, and one of the aspects I had to face at some point was the capability of the storage device to sustain the simultaneous creation of pngs: I am not sure this apply to your case as well but it’s worth mentioning.

Cheers,
Danilo
Threads_fineGrained.cpp (3.05 KB)

Danilo · March 23, 2016, 1:54pm

Hi,

as I said, we’ll look carefully into the thread safety details your example brought up.
Meanwhile, I propose to express parallelism with a multiprocess approach in your case via the TProcPool class.
I attach the full code, runnable as a macro with ROOT 6 (at least 6.06), here*. At the heart of the new implementation, you find the Map functionality of TProcPool.

   TProcPool procPool(nWorkers);
   std::vector<size_t> range(dim1);
   std::iota(range.begin(), range.end(), 0);
   procPool.Map(DoStuff, range);

Cheers,
Danilo

const size_t dim1 = 500;
const size_t dim2 = 1000;

const size_t total = dim1 * dim2;

const size_t nWorkers = 10;

double **x, * *y, **z;

int DoStuff(size_t n)
{

   const unsigned int xres = 1024;
   const unsigned int yres = 768;
   const double xMaximum = 1;
   const double yMaximum = 1;

   std::stringstream Pad1Name;
   std::stringstream cvname;
   std::stringstream OutputFileName;
   cvname << "canvas_" << n;
   Pad1Name << "pad1_" << n;
   OutputFileName << n << ".png";

   //With no lock the following will crash
   TCanvas *cv;
   TPad *Pad1;
   TH1F *Frame1;

   cv = new TCanvas(cvname.str().c_str(), "A random title", 0, 0, xres, yres);
   cv->cd();
   Pad1 = new TPad(Pad1Name.str().c_str(), "A random title", 0, 0, 1, 1);
   Pad1->Draw();
   Pad1->cd();
   Frame1 = Pad1->DrawFrame(0, 0, xMaximum, yMaximum);

   Frame1->GetXaxis()->SetLabelOffset(0.035);
   Frame1->GetXaxis()->SetTitleOffset(1.5);
   Frame1->SetXTitle("x");
   Frame1->SetYTitle("y");
   Frame1->Draw("AXIS");
   Pad1->cd();
   TGraph *xy = new TGraph(dim2, x[n], y[n]);
   xy->SetMarkerColor(kRed);
   xy->SetMarkerStyle(2);


   TGraph *xz = new TGraph(dim2, x[n], z[n]);
   xz->SetMarkerColor(kGreen);
   xz->SetMarkerStyle(2);

   xy->Draw("P");
   xz->Draw("P");
   cv->Print(OutputFileName.str().c_str(), "png");
   delete Pad1;
   delete xy;
   delete xz;
   delete cv;

   return 0;

}

int Processes()
{
   //Fill some arrays with random numbers
   std::default_random_engine Generator;
   std::normal_distribution<double> NormalDistribution(0.5, 0.1);

   x = new double*[dim1];
   y = new double*[dim1];
   z = new double*[dim1];

   for (size_t k = 0; k < dim1; k++) {
      x[k] = new double[dim2];
      y[k] = new double[dim2];
      z[k] = new double[dim2];
   }

   for (size_t k = 0; k < dim1; k++) {

      for (size_t l = 0; l < dim2; l++) {
         x[k][l] = NormalDistribution(Generator);
         y[k][l] = NormalDistribution(Generator);
         z[k][l] = NormalDistribution(Generator);
      }
   }

   TProcPool procPool(nWorkers);
   std::vector<size_t> range(dim1);
   std::iota(range.begin(), range.end(), 0);
   procPool.Map(DoStuff, range);


   for (size_t k = 0; k < dim1; k++) {
      delete x[k];
      delete y[k];
      delete z[k];
   }
   delete[] x;
   delete[] y;
   delete[] z;

   return 0;
}

JMolson · March 25, 2016, 4:20pm

and

[quote=“dpiparo”]Hi,

as I said, we’ll look carefully into the thread safety details your example brought up.
[/quote]

Thank you!

dpiparo:

Meanwhile, I propose to express parallelism with a multiprocess approach in your case via the TProcPool class.
I attach the full code, runnable as a macro with ROOT 6 (at least 6.06), here*. At the heart of the new implementation, you find the Map functionality of TProcPool.
   TProcPool procPool(nWorkers);
   std::vector<size_t> range(dim1);
   std::iota(range.begin(), range.end(), 0);
   procPool.Map(DoStuff, range);
Cheers,
Danilo

This does indeed work nicely.

I had a look inside TProcPool.h and saw that the Map function calls Fork(), which of course will end up (in my case) eating up lots of RAM and then going into swap. I can of course just reduce the process count, but then I’m under-using cpu resources Some speed up is always better than none.

[quote=“bellenot”]Hi,

There are a few ROOT tutorials showing how to use threads in the $(ROOTSYS)/tutorials/thread/ directory.
Hope this will help.

Cheers, Bertrand.[/quote]

Ah, there seems to be a multicore folder as well as a thread folder, and the TProcPool examples live in the multicore folder, and I was looking in the thread folder

Thanks for the help.

Danilo · March 25, 2016, 4:55pm

Hi,

glad you have a solution that work for you.
My take is that the overall pss, somehow “the memory footprint of a multiprocess execution without double counting”, will not be very high, even with lots of forks as the majority of the pages will remain read only.
Of course, as you correctly point out, the multithreaded case would use less memory “by construction”.

Cheers,
Danilo