Can be the garbage collector that slow?

Hi Wim and the other PyROOT experts,

some days ago I posted a question on StackOverflow forum concerning the garbage collector speed. I posted there because I thought there was a bug in my own code, or maybe a Python-C++ problem.
But now I’m wondering if it can be a PyROOT related problem.

My post is here:
stackoverflow.com/questions/3916 … -that-slow
there you can look at the outputs of the profilers as well.

Mainly I get the garbage collector gc.collect() function almost stuck (or, better, it takes a lot of time to complete), when the program calls a function that handles a dict containing tens of thousands of histos. I put a print just before the return statement, and between that last print and the return of the function… half an hour passes!!! :stuck_out_tongue:
And in this half an hour the python script takes the 99% of my CPU while the memory consumption slightly increases (checked simply with top).
[aside question: garbage collector should free the memory…so why the mem consumption increases here?? :stuck_out_tongue: ]

My question is: did someone experienced that as well?

Do you have any suggestions hot to fix this behaviour?

Thanks a lot, and have a nice weekend,

Ric.

moreover, looking at the len( gc.get_objects() ) I see that it always increases, even when passing from a function to another one, when at least some ref counters should decrease and some object deleted.

A question came to my mind: is that related to any PyROOT ownership setting?

Thanks again,

Ric.

Ric,

lots of information there, so I’m going the throw a bundle of information back. :slight_smile:

  • Profiler showing gc isn’t necessarily meaningful, as the profiler only shows what it profiles.
  • If the objects are histograms, they are shared with the C++ side (no matter what you do), so every delete will cause a notification going that way to prevent dangling pointers later on.
  • Mem increase … not sure, but gc freeing memory is not that simple. First off, python uses (rather large) arena’s, which can be pinned by a single object. Furthermore, the OS won’t show freed memory unless there is memory pressure on the system, for it to actually take the memory back from the process.
  • If you keep gc.get_objects() alive, objects won’t go away.
  • Whether the C++ side object actually goes away, depends on ownership.

Do you have a sample script?

Cheers,
Wim

Hi,

I have a PyRoot script that loops over root files and prints out information about different histograms/directories in the file. I find when I loop over many large files, it takes a long time at the end of the job, so I’m assuming it’s also the garbage collection. Since the files are internal, can I send you the files and script off-list?

Thanks,
Charles

Charles,

yes, that would work (although I’m about to go offline again for a couple of days). I can probably also populate some files to try this out, but the real question is perhaps how many files/objects on the ROOT side we’re talking about here?

Cheers,
Wim

Hi Wim,

thanks a lot for your answer.

Yes, you are right. Anyway, I also checked “by hand”: actually I trivially put a simple “print” in the last line of my code, just before the final “return”, and between the last print and the end of my job many minutes pass…

Yes, they are histograms and canvases.
But I didn’t really understand what you are saying, sorry. Does it mean that if I use “del” on the python side, the objects are not deleted? Because I have the feeling that the c++ objects are not deleted sincronously with the python references, but perhaps I’m wrong. Maybe the ROOT c++ side has another own way to delete object with a different schedule?

I see. But anyway the gc.get_objects() print was only for testing. In production this statement is commented out, and I don’t notice any change in the termination time.

I tried to change the SetOwnership(), but I did not notice any change. But maybe I’m missing something…

It’s difficult to make a test script. My application is quite large cut optimization tool, and it’s difficult now to extract a simple script.

But as I saw you asked to Charles the order of magnitude, I can say you that in my case, in one of the function which takes so long to terminate, I handle 2 python dictionaries, each of them containing roughly 140k histograms.

Thanks again for your help, Wim

Ciao,

Ric.

Ric,

a fix to MemoryRegulator that I just put in trunk should speed-up object destruction considerably …

Cheers,
Wim

Many thanks Wim! :slight_smile:

I’ll test it asap, and I’ll let you know.

Thanks again for your kind help :slight_smile:

Ric.