Issue while running RDataFrame macro in batch mode


ROOT Version:_ 6.24/02
Platform:Linux/Fedora 34
Compiler:gcc version 11.3.1 20220421


I have written a macro to use power of RDataFrame’s multithreading. I am calling
ROOT::EnableImplicitMT() and ROOT::EnableThreadSafety();.

I want to execute the macro by loading the corresponding shared object file (.so) in two ways (1) at the prompt and (2) using a batch mode

(A) At the ROOT prompt I do:

.L my_macro_cxx.so;
my_macro(1,2);
Elapsed time : 00h:00m:9.10s 
my_macro(1,3);
Elapsed time : 00h:00m:6.94s
my_macro(1,4);
Elapsed time : 00h:00m:7.32s 

This works perfectly. But, in this case I have to wait to execute next command until the first execution is over. This is undesirable and requires that I monitor the execution.

To automate the above process, (2nd method), I have generated an input file (run_macro.txt).
(B) Then execute the following command:

root -l -b -q 'run_macro.txt'

Processing run_macro.txt...

Elapsed time : 00h:00m:8.11s 

Error in <TReentrantRWLock::WriteUnLock>: Write lock already released for 0x2a02698

In this case, I get the error. Why?

[Note added:]
In fact, I observed that I am unable to execute any (including non-RDataFrame) macro as mentioned on the first line of page 27 of this document , at the bash prompt with,

root -l -b my_macro_cxx.so

I checked this on ROOT 6.30/02 also.

What is happening?

try maybe instead:

root -l -b -q my_macro.cxx+(1, 2) && root -l -b -q my_macro.cxx+(1, 3) && root -l -b -q my_macro.cxx+(1, 4)

or even the same with a bash script and a for loop

Hi @ferhue,

Thanks for your reply!

I have already tried that and I know it works.

However, the problem with this method is a considerably increased execution time.

Elapsed time : 00h:00m:19.18s 

i.e. 10 seconds more compared to the first method. What I am after is the faster execution speed.

In the original post, I have sorted only 2% of the data; and I have to execute this command around 50 times!

I need to run the shared library at the command for quicker execution.

Use then:

root -l -b -q my_macro.cxx+(1, 2) my_macro.cxx+(1, 3) my_macro.cxx+(1, 4)

Are you using TThreads?

This error could be because your ROOT version is a bit old. Some bugs have been solved in the meanwhile. Please try with 6.30.04

As I mentioned, I am using RDataFrame; which might be be using TThread internally. But, I am not sure.

You mean ROOT 6.30/02 is also old?

I have used that also version. Please see at the end of my first post of this thread.

Oh I see. As said, try using what I mentioned in the last message. Or alternatively in the prompt, all in one line:

.L my_macro_cxx.so;
my_macro(1,2); my_macro(1,3); my_macro(1,4);

If you still see the Write lock error, then we might need a full reproducer to find out what’s going on. If there’s a TThread behind, it is a well-known (long-standing) problem. Data races TThread TTimer TApplication · Issue #8365 · root-project/root · GitHub

This works, but after the first execution it says:
Info in <ACLiC>: unmodified script has already been compiled and loaded
Can this be avoided?

Try maybe:
root -l -b -q my_macro.cxx+(1, 2) my_macro(1, 3) my_macro(1, 4)
or using
root -l -b -q my_macro.cxx+(1, 2) -e 'my_macro(1, 3); my_macro(1, 4);'

1 Like

Dear @ferhue,
Thank you very much for your help!
It works perfectly as expected.
Regards,
Ajay

I still have an issue! When I run these commands on a large data set, I get the following errors

Error in <TString::ReadBuffer>: found case with nwh=255 and nchars=-29116
Error in <TFile::ReadKeys>: reading illegal key, exiting after 0 keys
Info in <TFile::Recover>: tw_e411d344.root, recovered key TH1D:h1 at address 228
Info in <TFile::Recover>: tw_e411d344.root, recovered key TH1D:h2 at address 19720
Info in <TFile::Recover>: tw_e411d344.root, recovered key TH1D:h3 at address 42484
Info in <TFile::Recover>: tw_e411d344.root, recovered key TH1D:h4 at address 59982

I have tried all the methods that you have suggested. But, no luck in this case!

This means that the TFile you originally generated went corrupt. It might be because your hard drive had some issues, or because the TString you wanted to save was very very large and went into overflow, so nchars became negative.
If you share the TFile somewhere, we can further take a look.

I get the error ONLY when I run the code as:
root -l -b -q 'my_macro++(1,2)' -e 'my_macro(1,3)'

However, when I execute the code following way then there is no error
root -l -b -q 'my_macro++(1,2)'; root -l -b -q 'my_macro++(1,3)'

That could mean that you are not ‘closing things’ well in your macro, and in the second execution, there are some issues when reopening the same TFile or something like that. Another potential issue could be that you are running into thread races. Without a minimal reproducer example with script and TFile, it’s hard to give further help.

Ah, how can be that stupid!
As you have guessed correctly, I had forgotten to Close the file at the end of the code.

Now, after adding fop->Close(), I don’t get those errors.

Thank you very much for your help! Much appreciated.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.