Dear ROOT experts,
In order to prepare my data for the fitting procedure I want to create a lot of histograms for the systematic variations. Each systematic variation for each process is stored in a separate file. So I need to create 1 histogram per file for a lot of files. I want to take advantage of RDataFrame
multithreading capabilities, so I’ve written a program that boils down to this
import ROOT
ROOT.EnableImplicitMT(10)
handle_list = []
file_count = 0
for file in file_list:
file_count += 1
print(file_count)
df = ROOT.RDataFrame('tree', file)
handle_list.append(df.Histo1D(("h", "h", 10, 0, 100), "observable", "weight"))
ROOT.RDF.RunGraphs(handle_list)
file = ROOT.TFile('output.root', 'recreate')
for hist_handle in handle_list:
hist = hist_handle.GetValue()
hist.Write()
file.Close()
This approach fails in the following way
2123
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
2125
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ZZ_QCDJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ZZ_EWKJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ZZ_QCDJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ZZ_QCDJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
Traceback (most recent call last):
File "create_fit_histograms.py", line 373, in <module>
File "create_fit_histograms.py", line 345, in main
File "create_fit_histograms.py", line 282, in get_hist_hande_list
cppyy.gbl.std.runtime_error: Template method resolution failed:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_3lCR_PFLOW
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_3lCR_PFLOW
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_3lCR_PFLOW
I wasn’t expecting files to stay open between histogram request and the graph compilation, but this is far and 2123 files seems like too much.
I’ve tried to modify the code to make it run on chunks of the initial file list
import ROOT
ROOT.EnableImplicitMT(10)
for file_list_chunk in chunk_list:
handle_list = []
file_count = 0
for file in file_list_chunk:
file_count += 1
print(file_count)
df = ROOT.RDataFrame('tree', file)
handle_list.append(df.Histo1D(("h", "h", 10, 0, 100), "observable", "weight"))
ROOT.RDF.RunGraphs(handle_list)
file = ROOT.TFile('output.root', 'recreate')
for hist_handle in handle_list:
hist = hist_handle.GetValue()
hist.Write()
file.Close()
It runs fine for the first chunk of 1246 files, but fails on the second one on 508th file
508
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libHist.so for shared_ptr<TH1D>
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
Error in <TInterpreter::TCling::AutoLoad>: failure loading library libROOTDataFrame.so for ROOT::Internal::RDF::ActionTags::Histo1D
512
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/WtJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/SingleTopJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ttbarJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/ttVJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/WtJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
SysError in <TFile::TFile>: file /mnt/c/Users/Alex/cernbox/IncZZ/convertDatasets/../SlimmedCorrected_Nov2022_nodouble/../SlimmedCorrected_Nov2022_nodouble_syst/JET/WtJET_EffectiveNP_Mixed1__1down.root can not be opened for reading Too many open files
Traceback (most recent call last):
File "create_fit_histograms.py", line 365, in <module>
File "create_fit_histograms.py", line 339, in main
File "create_fit_histograms.py", line 282, in get_hist_hande_list
cppyy.gbl.std.runtime_error: Template method resolution failed:
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_PFLOW
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_PFLOW
ROOT::RDF::RInterface<ROOT::Detail::RDF::RJittedFilter,void> ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Filter(experimental::basic_string_view<char,char_traits<char> > expression, experimental::basic_string_view<char,char_traits<char> > name = "") =>
runtime_error: GetBranchNames: error in opening the tree tree_PFLOW
This is really strange for me
- Why do the files remain open after the first iteration of the
for
loop has ended? - Why does the number of allowed open files is ~400 files lower than in the first case?
How should I go about creating such code?
Here’s also full version of the code used to get the error above create_fit_histograms_simple.py (12.9 KB). The files themselves are about ~100 GB and I can share them privately if needed.
Best regards,
Aleksandr
ROOT Version: 6.26/04