Problem with RDataFrame multithreading

ajaydeo · January 3, 2024, 1:06pm

ROOT Version:_ 6.24/02
Platform:Linux/Fedora 34
Compiler: gcc version 11.3.1 20220421

I get the following error when I enable multithreading with ROOT::EnableImplicitMT(); in the
attached code (2.4 KB).

RDataFrame::Run: event loop was interrupted
Error in <TRint::HandleTermInput()>: std::runtime_error caught: Cannot fill histogram with values in containers of different sizes.

Without multithreading the code works perfectly and produces output that is exactly same with my earlier approach using TChain.

Please help.

Best regards,

Ajay

Danilo · January 4, 2024, 6:48am

Dear Ajay,

Is the code contained in your functions thread safe?
RDataFrame invokes it from different threads and any usage of global variables in that context would lead to data races.

Cheers,
Danilo

ajaydeo · January 4, 2024, 6:52am

Dear @Danilo,

Thank you very much for your reply.

Frankly speaking, I’m not sure how to check whether the code/functions is/are thread safe.
The code that I have been using is attached in the previous post. Can you please take a look at it?

Regards,

Ajay

Danilo · January 4, 2024, 7:16am

Hi,

As I was hinting, your code is not thread safe. You are manipulating global variables inside the functions that Define invokes. I did not look at the details of your code, but I see two rough options:

You do not use global variables
You use DefineSlot

Cheers,
D

ajaydeo · January 5, 2024, 1:06pm

Dear @Danilo

Thank you very much for the suggestions.

I think, I have solved the problem by avoid use of global variables.

However, I still have to check the performance (time-wise) between the conventional and RDataFrame approaches.

Regards,

Ajay

Danilo · January 6, 2024, 6:27pm

Thanks,

Let us know your findings.

Best,
D

ajaydeo · January 11, 2024, 1:53pm

Dear @Danilo,

I am sorting about 10Gb of data stored in the form of Tree.

When I sort the data using standard TChain it takes 04m:58.15s to sort the required histograms.

On the other hand, with RDataFrame it takes only 01m:57.07s for the sorting.

I have to generate several more histograms by repeating the process.

So, I am really very happy that I would be able to do it quickly.

Thank you once again for your help.

Regards,

Ajay

Danilo · January 11, 2024, 2:31pm

Thanks for the report: this is the kind of benefits we are after with RDataFrame and implicit parallelisation of ROOT!

ajaydeo · January 11, 2024, 3:19pm

Hi @Danilo,

I am certainly benefited by RDataFrame. However, there is one thing which I don’t quite understand - The number of cores and the amount of time with multi-threading doesn’t really scale one-to-one. I am sorting the data on a machine with 32 cores. Then shouldn’t the sorting be 32 times faster than when the data is sorted using a single core? While in the present case I gain only by a factor of ~ 2.5 in time.

Is this expected? Or am I missing something?

Ajay

Danilo · January 11, 2024, 3:23pm

Hi,

Reaching perfect scaling is not trivial with any tool, it’s not a specific feature of RDataFrame.

There are many factors which enter the game, for example:

Total possible throughput from storage on the actual machine (or from network in case of remote reads)
Physical and logical cores, fat and thin cores
Number of files and granularity of files (i.e. number of clusters)
CPU throttling when using 1 core or all cores
Architecture of the CPU, including (shared) L2 and L3 caches
Parallelisable portion of the program, i.e. Amdhal’s law.

Cheers,
Danilo

ajaydeo · January 11, 2024, 3:28pm

Hi @Danilo

Thanks for the quick reply. So, the amount of time which I gain from the implicit parallelisation of ROOT is satisfactory from your experience, right?

Ajay

Danilo · January 11, 2024, 3:29pm

Hi,

It really depends on many factors as we said, but it’s a start, no?

Cheers,
D

ajaydeo · January 11, 2024, 3:35pm

Yes @Danilo, indeed it’s a good start!

I shall also try to copy the data to SSD to check if I gain in time.

system · January 25, 2024, 3:36pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.