Unstable results when enabling MT and filling a THnD histogram using Take

Javier_Galan · July 28, 2023, 10:39am

   ------------------------------------------------------------------
  | Welcome to ROOT 6.26/06                        https://root.cern |
  | (c) 1995-2021, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jul 28 2022, 18:08:51                 |
  | From tags/v6-26-06@v6-26-06                                      |
  | With c++ (Debian 12.2.0-14) 12.2.0                               |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
   ------------------------------------------------------------------

I am experiencing problems when filling a THnD histogram using Take. It seems that the results when I enable MT are unstable, each execution leads to different results.

testMT.C (1.4 KB)

So, each time I execute root -b -q testMT'(true)'

root -b -q testMT.C'(true)' 

Processing testMT.C(true)...
N-values filled: 10000000
Content 142
Nbins : 685848
(int) 0

root -b -q testMT.C'(true)' 

Processing testMT.C(true)...
N-values filled: 10000000
Content 164
Nbins : 685848
(int) 0

The content of the bin I am evaluating produces a different value, that it is in fact much different from the value I get when MT is not enabled.

When I execute: root -b -q testMT'(false)'

root -b -q testMT.C'(false)' 

Processing testMT.C(false)...
N-values filled: 10000000
Content 393
Nbins : 685848
(int) 0

root -b -q testMT.C'(false)' 

Processing testMT.C(false)...
N-values filled: 10000000
Content 393
Nbins : 685848
(int) 0

The result seems to be stable.

I prepared an AnalysisTree with 10M entries, because with 1M entries the result was stable. So it might be connected with the size of the dataframe.

I placed the file at the following location. https://sultan.unizar.es/exchange/AnalysisTree10M.root

Please, we need some insights on this topic!

eguiraud · July 31, 2023, 5:28pm

Hi @Javier_Galan ,

sorry for the high latency, we’ll take a look as soon as possible.

As an aside, why the Take + manual Fill rather than directly filling a histogram during the RDF event loop? This way you loop over data twice and load all column values into memory.

Cheers,
Enrico

mczurylo · August 3, 2023, 10:01am

Hi @Javier_Galan,

I’ve been testing your reproducer and I do see the same problem, but for now I don’t have a direct solution for your exact method with Take and manual Fill. Is there a reason why you want to use Take and manual Fill?

However, I have also tested what @eguiraud suggested and the histogram filling looks stable and consistent, I tested both with bin contents and plotting the 3D histogram, see attached.

Int_t testMT( Bool_t mt )
{
    TCanvas cv;

    if( mt ) ROOT::EnableImplicitMT();
    ROOT::RDataFrame df("AnalysisTree", "AnalysisTree10M.root");

    Int_t* bins = new Int_t[3];
    Double_t* xmin = new Double_t[3];
    Double_t* xmax = new Double_t[3];

    for (size_t n = 0; n < 2; n++) {
        bins[n] = 80;
        xmin[n] = -20;
        xmax[n] = 20;
    }
    bins[2] = 100;
    xmin[2] = 0;
    xmax[2] = 10;

    auto myhisto = df.Histo3D({"sparse", "sparse", 80, -20, 20, 80, -20, 20, 100, 0, 10}, "final_posX", "final_posY", "final_energy");

    std::cout << "Bin contents: " << myhisto->GetBinContent(40 , 40, 50) << std::endl;
    auto c_mc = new TCanvas("c_mc", " ", 600, 600);
    myhisto->Draw();
    c_mc->SaveAs("test3dhisto_true1.png");

    return 0;
}

Cheers,
Marta

Javier_Galan · August 3, 2023, 11:00am

Dear @eguiraud, in origin my problem was about filling a THnSparse object. There is another post about this here:

So, in origin I was working with an THnSparse and while trying to make it work I thought it would be good to try with a THnD. I need to use a THnX so that I can in future use any number of variables.

EDIT: I see now that perhaps the solution in previous post is that I need to use THnSparseD and not the abstract class THnSparse!

I imagine the @mczurylo example will also work with THnD without problems, but the point is that I want to be able to fill in a THnSparse to be more memory efficient, and I think this is only possible with a manual fill?

I got other issues with THnSparse, see post:

Thanks for your replies!

mczurylo · August 11, 2023, 2:26pm

Hi @Javier_Galan,

we found the source of the problem.

The lines:

    std::vector<std::vector<double> > data;
    auto parValues1 = df.Take<double>("final_posX");
    data.push_back(*parValues1);
    auto parValues2 = df.Take<double>("final_posY");
    data.push_back(*parValues2);
    auto parValues3 = df.Take<double>("final_energy");
    data.push_back(*parValues3);

should be reordered to:

    auto parValues1 = df.Take<double>("final_posX");
    auto parValues2 = df.Take<double>("final_posY");
    auto parValues3 = df.Take<double>("final_energy");
    
    std::vector<std::vector<double> > data;
    data.push_back(*parValues1);
    data.push_back(*parValues2);
    data.push_back(*parValues3);

This way you only perform a single event loop (you first book all three actions and then get the values - start the event loop) and everything will work fine in multithreaded mode as well (also you get an improvement performance-wise). In your case, you initiated 3 multi-threaded event loops (you booked the action, got values, booked another action, got values, booked the third action, got values) which then processes entries in different order.

We will work on making better warnings so that such issues are avoided.

Have a nice end of week,
cheers,
Marta

system · August 25, 2023, 2:27pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.