TTree memory limit

OtaviusDecius · September 17, 2022, 6:34pm

Hi,

I have a quite large number of histograms in my C++ code using ROOT libraries.
The further the analysis progresses the greater the number of histograms. I run 12 sets of 2000 jobs, one set at a time. Recently I have noticed, while merging the histos, hadd reports the following error:

hadd Source file 1998: data_997.root
hadd Source file 1999: data_998.root
hadd Source file 2000: data_999.root
hadd Target path: t200-32.root:/
hadd Target path: t200-32.root:/demo
hadd Opening the next 923 files
Warning in <TFile::Init>: file data_1914.root probably not closed, trying to recover
Info in <TFile::Recover>: data_1914.root, recovered key TDirectoryFile:demo at address 232
Warning in <TFile::Init>: successfully recovered 1 keys
hadd Target path: t200-32.root:/
hadd Target path: t200-32.root:/demo
hadd Opening the next 154 files
hadd Target path: t200-32.root:/
hadd Target path: t200-32.root:/demo

Moreover, I am observing lost of statistics.

Searching for information on this error, ROOT Forum reports memory limit issue (max=2GB).
Rene Brun said:
“I see from the result of file->Map that you have reached the maximum default Tree size limit of 1.9 GBytes. You should have received an error message or a warning indicating that the system is automatically switching to a new file. Read carefully the documentation of TTree::ChangeFile at
root.cern.ch/root/html/TTree.htm … ChangeFile”

It seems that ROOT crates a new version output file, which is not appropriately closed. However, in my case, the previous one gets lost. A solution would be to increase the memory limit.
How do I do that in my C++ code?

TTree::LoadBaskets (Long64_t maxmemory = 2000000000)
Read in memory all baskets from all branches up to the limit of maxmemory bytes.
Default for maxmemory is 2000000000 (2 Gigabytes).

Would anyone provide a direction?

I appreciate,
Luiz Regis

Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.26/06
_Platform: Fedora 35
_Compiler: gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)

pcanal · September 19, 2022, 4:48pm

It seems that ROOT crates a new version output file, which is not appropriately closed.

At the end of your process you probably need to use:

mytree->GetDirectory()->GetFile()->Write();

A solution would be to increase the memory limit.
How do I do that in my C++ code?

mytree->SetMaxTreeSize(10*1024*1024*1024*1024LL); // max 10TB

More details at ROOT: TTree Class Reference

Wile_E_Coyote · September 19, 2022, 4:52pm

@pcanal The “hadd” is used here.

@OtaviusDecius Try:

pcanal · September 19, 2022, 7:23pm

Yes and it seems it is complaining about the input files

Wile_E_Coyote · September 20, 2022, 7:41am

Well, it could be that some “job” that creates individual files died and left a file improperly closed.
But then, adding “mytree->...->Write();” will not help (as the “job” will die anyhow while processing some “peculiar” event).
On the other hand, if it is just a single file out of 2000, then there should be no significant “lost of statistics” (well, yes, one should try to debug the problem of why this “job” died for this file and how many events are missing).

OtaviusDecius · September 20, 2022, 3:24pm

Hi Philippe,

Thanks for the input.
My code generates root files with lots of histograms. I realized the more the number of histograms I set up in the code the more the root files (output) with the problem above I’ve got. Although they are always recovered when merging them via hadd, I have obseved statistics lost.
I do not use Write() or Close() in my code because of the following information:

PromptAnalyzer::endJob()
{
// don't include it with TFileService ...Write() and Close() are done automatically!
//for(map<string,TH1F*>::iterator it_histo = histosTH1F.begin();it_histo != histosTH1F.end(); ++it_histo)(*it_histo).second->Write();
}
PromptAnalyzer::beginJob()
{
 edm::Service<TFileService> fs;
 int nbins_pt = 100;
etc...
  histosTH1F["hpt"] = fs->make<TH1F>("hpt","p_{T}",nbins_pt,0,5);
  histosTH1F["heta"] = fs->make<TH1F>("heta","#eta",nbins_eta,-4,4);
  histosTH1F["hphi"] = fs->make<TH1F>("hphi","#varphi",nbins_phi,-3.2,3.2);
  histosTH1F["halgo"] = fs->make<TH1F>("halgo","Algo",15,0,15.);
  histosTH1F["hnhits"] = fs->make<TH1F>("hnhits","nhits pix+strip",40,0,40.);
etc...
}
class PromptAnalyzer : public edm::one::EDAnalyzer<edm::one::SharedResources>  {
public:
      explicit PromptAnalyzer(const edm::ParameterSet&);
      ~PromptAnalyzer();
  private:
      virtual void beginJob() override;
      virtual void analyze(const edm::Event&, const edm::EventSetup&) override;
      virtual void endJob() override;
  
      virtual void beginRun(edm::Run const&, edm::EventSetup const&);
      virtual void endRun(edm::Run const&, edm::EventSetup const&);

      bool jsonLocal(int r, int ls);

// ----------member data ---------------------------
  edm::EDGetTokenT<reco::TrackCollection> trkToken_;
  edm::EDGetTokenT<vector<CTPPSLocalTrackLite> > RPtrkToken_;
  edm::EDGetTokenT<reco::VertexCollection> vtxToken_;
  edm::EDGetTokenT<reco::BeamSpot> beamspotToken_;
  edm::EDGetTokenT<edm::TriggerResults>  trigToken_;
  // V0 ...Luiz
  edm::EDGetTokenT<reco::VertexCompositeCandidateCollection> kshortsToken_;
  edm::EDGetTokenT<reco::VertexCompositeCandidateCollection> lambdasToken_;
  edm::EDGetTokenT<reco::DeDxDataValueMap> dedxsToken_;
  edm::EDGetTokenT<reco::DeDxDataValueMap> dedxPIXsToken_;
  
  HLTConfigProvider hltConfig_;

  map<string,TH1F*> histosTH1F;
  map<string,TH2F*> histosTH2F;

};

From the above you can see that the histograms are handled by map<>. Data are root-tuples.
Maybe I have to set up max memory in python config level.

Thanks
Luiz

OtaviusDecius · September 20, 2022, 3:32pm

Hi Wile,

Sometimes I get only one problematic file, sometimes dozens.
I had to turn off a bunch of histos in order to minimize this problem.
For instance, in my KshorKshort channel analysis #15 I’ve got 3679 events;
in my analysis #31 I’ve got 3181 events. This is a big difference. That is my concern.

Thanks,
Luiz

OtaviusDecius · September 20, 2022, 3:49pm

Hi Wile,

Sorry, I missed your input above. I will give it a try.

Thanks,
Luiz

OtaviusDecius · October 3, 2022, 2:36am

Hi Wile,

The following method does not work:
#include “TTree.h”
int startup() {
TTree::SetMaxTreeSize( 1000000000000LL ); // 1 TB
return 0;
}
namespace { static int i = startup(); }
LD_PRELOAD=startup_C.so
hadd…

I have tried:
#include “TTree.h”
…
void
PromptAnalyzer::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup)
{
…
TTree::SetMaxTreeSize( 1000000000000LL );
…
}
no luck. I tried also:
void
PromptAnalyzer::beginJob()
{
…
TTree::SetMaxTreeSize( 1000000000000LL );
…
histosTH1F[“hpt”] = fs->make(“hpt”,“p_{T}”,nbins_pt,0,5);
}
no luck. Also:
void
PromptAnalyzer::beginRun(edm::Run const& run, edm::EventSetup const& es)
{
…
TTree::SetMaxTreeSize( 1000000000000LL );
…
}
no luck.

Luiz Regis

pcanal · October 3, 2022, 1:25pm

Hi Luis,

Is there a TTree stored in the output file of your PromptAnalyzer?

Philippe.

OtaviusDecius · October 3, 2022, 2:08pm

Hi Philippe,

No, only histograms in the output root file.
In my C++ code I have the data (ntuple) accessed via getbytoken:

#input memory
#input map

class PromptAnalyzer : public edm::one::EDAnalyzer<edm::one::SharedResources>  {
public:
      explicit PromptAnalyzer(const edm::ParameterSet&);
      ~PromptAnalyzer();
  private:
      virtual void beginJob() override;
      virtual void analyze(const edm::Event&, const edm::EventSetup&) override;
      virtual void endJob() override;
  
      virtual void beginRun(edm::Run const&, edm::EventSetup const&);
      virtual void endRun(edm::Run const&, edm::EventSetup const&);
...
}
...
  edm::EDGetTokenT<reco::TrackCollection> trkToken_;
  edm::EDGetTokenT<vector<CTPPSLocalTrackLite> > RPtrkToken_;
  edm::EDGetTokenT<reco::VertexCollection> vtxToken_;
  edm::EDGetTokenT<reco::BeamSpot> beamspotToken_;
  edm::EDGetTokenT<edm::TriggerResults>  trigToken_;
  // V0 ...Luiz
  edm::EDGetTokenT<reco::VertexCompositeCandidateCollection> kshortsToken_;
  edm::EDGetTokenT<reco::VertexCompositeCandidateCollection> lambdasToken_;
  edm::EDGetTokenT<reco::DeDxDataValueMap> dedxsToken_;
  edm::EDGetTokenT<reco::DeDxDataValueMap> dedxPIXsToken_;
  
  HLTConfigProvider hltConfig_;

  map<string,TH1F*> histosTH1F;
  map<string,TH2F*> histosTH2F;

};

By the way, let me correct the information I provided,
the statistics lost is not about 500 events as I mentioned above,
it is much less. I have to measure it though.

thanks,
Luiz

pcanal · October 3, 2022, 4:10pm

The my advise was irrelevant

In your original post, the “failing” file is data_1914.root is that one of the output of PromptAnalyzer?

OtaviusDecius · October 3, 2022, 4:27pm

Yes, it is.

Wile_E_Coyote · October 3, 2022, 4:33pm

Maybe another idea … check “open files” limits in:
ulimit -H -a
ulimit -S -a

Try to increase it (in a shell in which you then run “hadd”), e.g.:
ulimit -S -n 4096

You could also try: hadd -n 1 ...

And you could also try to increase the “stack size” limit (also in the shell in which you run “jobs” that produce partial files as it is possible that some “job” dies because of it), e.g.:
ulimit -S -s 32768

pcanal · October 3, 2022, 4:46pm

When you re-run the example that leads to the bad file, is it always the file data_1914.root that causes the problem?

OtaviusDecius · October 3, 2022, 6:42pm

Hi Philippe,

Yes, it is.

Luiz

OtaviusDecius · October 3, 2022, 6:43pm

Hi Wile,

Thanks for the input. I will give it a try as soon as possible.

Luiz

pcanal · October 3, 2022, 7:12pm

What code do you use to open the file and to close them? When is that code (respectively) called?

OtaviusDecius · October 4, 2022, 2:09pm

Here are the limits:

[lregisem@lxplus751 src]$ ulimit -H -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 116931
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 116931
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

[lregisem@lxplus751 src]$ ulimit -S -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 116931
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Wile_E_Coyote · October 4, 2022, 2:31pm

I guess, to “exclude” some possible “known” problems, try first: “ulimit -S -s 32768; hadd -T -n 1 ...”
If the “hadd” still dies on the same partial input file, try to set “ulimit -S -s 32768” and (in the same shell) run the job that produces it again.

Actually, maybe you could attach this file here for inspection.