Files not closing for batch jobs due to large number of events

Dear Experts,

I am trying to submit batch jobs to generate a dataset and store them in trees. But due to large number of events, the output toys are not closing although the batch jobs complete without any error message on the log files. For lesser number of events, there is no problem.

Here’s my working directory:
fit_validation.zip (18.1 KB)
The main script is generation_tree.C and the script for batch jobs is submit_generation.sh

Please have a look.

I guess @jonas can help.

Hi Sanjeeda,

Could you please be a bit more specific and highlight what exactly the problem is with a minimal reproducer, that can be copied from this thread and pasted inside the ROOT interpreter prompt or a simple program to experience the issue, if any, on our machines?

Best,
Danilo

Dear @Danilo,
Thank you for your response. Let me try to explain more clearly. I am trying to generate toy samples from my fit model. I want it to mimic the data so I need to have 1077935 events per toy. I am submitting batch jobs because I want to generate 1000 such toys. For simplicity you can simply run generation_tree.C+(1).

The problem is that the root files that are produced do not close and when I try to open the root file, I get the message:

Attaching file toy_1.root as _file0…
Warning in TFile::Init: file toy_1.root probably not closed, trying to recover
Warning in TFile::Init: no keys recovered, file has been made a Zombie
(TFile *) nullptr

I think this is happening due to the large number of events that are generated per toy because if I generate 10000 events, I get proper root files. Is there a way to get the root files even with 1077935 events in each toy?

Please let me know if it is still not clear.

Hi,

Thanks for the clarification. In principle ROOT has no such limitation. I would like to reproduce the issue on my machine: do you have a standalone macro/program to do so?

Cheers,
D

Dear @ Danillo,

Ye, you can simply download the working directory and in that, folder there is a macro generation_tree.C. You can simply run generation_tree.C+(1). THis should give you a root file named toy_1.root.

Hello,

Thanks a lot: indeed it was also in your first post and it was straightforward to run.
I do not stumble in the limitation you described running on a local computer: do you?

The warning message

Warning in TFile::Init: file toy_1.root probably not closed, trying to recover

is typically prompted when truncated files are opened - in those cases, a recovery procedure is triggered automatically to read back the available keys, however, in your case

Warning in TFile::Init: no keys recovered, file has been made a Zombie

Can you verify via the logs of the batch system you submitted the jobs to whether something happened with the job associated to the production of toy_1.root, e.g. if a limit such as RSS, duration, VSIZE was hit and the process was killed by the batch system abruptly?

I hope the questions above are of help in the debugging…

Best,
Danilo

Thanks. How much time does it take for you to finish running root-l generation_tree.C+(1)?
Yes, i have checked the log files and the last messages are shown below, so I think It stops abruptly at line: 858444

@Danilo, which version of root are you using? This is probably not working with 6.24. but working with 6.28

Hi,

I tested with 6.30.04.

Cheers,
Danilo

Then could it be a version issue, it is working with 6.28 and 6.30 but not for 6.24?

Hi,

It could very well be. The last release of the 6.24 branch, 6.24/08 dates October 2022. A lot was improved since then.

Best,
Danilo

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.