Pipe: Too many open files

Hi Experts,

I have 2000 root files. I want to fit all the samples one by one. I set a loop over 2000 root files but after 479, I am getting some error like this
libc++abi.dylib: terminating with uncaught exception of type RooFit::BidirMMapPipe_impl::BidirMMapPipeException: pipe: Too many open files
code:

void massfit(){
  cout<<"size for fitting "<<endl;
  cin>>size;
  for(int fc=0;fc<size;fc++){
    TString file=Form("splitdata_bujpsik_v04_2000_part%d.root",fc);
    TFile *f = TFile::Open(file,"read");
    TTree *tree =(TTree*)f->Get("events");
    int nentries_ = tree->GetEntries();
;
; fitter code
;
file.Clear();                                                                                                                  
 f->Delete();
 TotPdf->Delete();
 datach0->Delete();//dataset
 data1->Delete();//dataset
  }
}

Suggestions are welcome.

Thank you
Chandi

Hi Chandi,

can you try to close the files with the TFile::Close method at the end of the loop body?

Cheers,
D

Hi Expert,

Thank you very much for your kind attention. It works.
But if I want to run over more than 2000 file, it fails after 2014 files. Is there any way workaround for more number of files?

Cheers
Chandi

Hi,

this is not expected. What is the error you are getting? What is your code?

Cheers,
D

Hi ,
Sorry for the late reply.
I found a way. If I am setting the ulimit range to more than 4096 then i can able to run over 4000 root files.(ulimit -n 4096).

The code is attached below.
massfit.cc (7.6 KB)

The problem seems to be that BidirMMapPipe_impl is leaking pipes. I’ll ask the implementer of that class whether he can have a look.

Axel.

Hi Axel, Chandi,

I probably won’t have time to look into it before next week Tuesday (when I’ll be at CERN for two weeks).

That said, I do have a couple of comments from a quick inspection:

  • Something is leaking file destructors. That much is clear.

  • I thought I had that debugged and checked when I wrote the class, but I’m willing to look again, because I’m good at making mistakes. :slight_smile:
    A quick inspection of my code did not reveal anything (the close and shutdown are where I remembered I put them), but to be sure, I’ll have to trace it, and check that BidirMMapPipe isn’t leaking file descriptors, which requires time…
    I also remember testing the code with 1024 open pipes when I wrote it (i.e. 2048 file descriptors), and did not run into trouble on the file descriptor front. (I could not go higher because I’d run out of process table entries with so many forked off child processes at the time…)

  • A general remark about resource leaks: Just because BidirMMapPipe throws the exception does not mean that BidirMMapPipe is necessarily the offender that’s leaking the resource in question. It’s similar to a memory leak: When you have a memory leak, the out-of-memory condition does not neccessarily hit in the code leaking memory, it can hit any code allocating memory. It literally can hit anywhere, and memory allocations are frequent. For file descriptors, the story is very much the same. Just because BidirMMapPipe is being told by the OS that we’ve run out of file descriptors, doesn’t mean that BidirMMapPipe is leaking them.

  • Have you checked that you make RooFit give up its resources at the end of the loop, and that you close all files you open in that loop? I’ve had a quick look through your massfit.cc file, and I see plenty of pointers, with no clear concept of object ownership, and virtually no explicit or implicit cleanup.

My guess would be that EffFile is not closed properly (opened on line 122 or so)…
(Or it could be that you or RooFit leaks a RooFit object that holds a BidirMMapPipe internally - then we’d be in much the same situation, without BidirMMapPipe being responsible.)

[Okay, rant of a fellow leak hunter begins - don’t take it personally, it’s gotten a bit longish because I’ve been in similar situations myself, and have a bit of experience on just how frustrating resource leaks can be… Maybe there’s something useful in there for you…]

My first order of business is usually to check my code to see that every new is paired with a delete, and every TFile::Open with a Close.

Also, one needs to worry about methods returning pointers (or taking pointers as arguments), since that can mean obtaining or transferring object ownership, and that means the responsibility to free things when you’re done can move from ROOT to you (or from your code to ROOT if you transfer ownership)…

I know it’s a pain, especially since the documentation is usually silent about the ownership transfer. I usually have to inspect the sources of the method I’m calling to know when I get or transfer object ownership. I’m not just saying that to make the bug report go away… :wink: I’ve spent months getting complex fits to not run out of memory, so I know just how frustrating that experience can be. The trouble is that much of the code in ROOT and RooFit was written before things like shared_ptr or unique_ptr were invented (and RAII still seems to be undervalued and misunderstood by large parts of HEP users), so resource management is a pain.

[Rant ends - sorry for the noise… :wink: ]

If EffFile above is not the offender, please let me know, and I’ll have a detailed look in BidirMMapPipe next week. But I suspect that your problem stems from the general “leakiness” of the code inside your loop.

Cheers, and let me know how this turns out,

Manuel

Thanks, Manuel.

A delete f instead of f->Delete()` (same for the other objects) might be a good start.

Axel

Hi Chandi,

when you have a moment, please let us know how you’re getting on (or if the problem has gone away), so I know what to work on… :slight_smile:

Cheers,

Manuel

1 Like

Hi Manuel and Axel,

Thank you very much for your suggestions.

Manuel, you are correct. The EffFile in line no. 122 was doing this issue. After closing the EffFile solve the problem.

Cheers
Chandi

1 Like

Glad we could help, and thanks for the feedback - that’s one thing less on my to do list… :wink:

Have a good Sunday,

Manuel

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.