Thank you for reaching out to the forum! Indeed I do not expect any del call in Python to have any immediate or predictable effect, that’s just how the CPython garbage collector works. A somewhat more reliable way would be to restrict variables allocating a lot of memory to a free function scope, taking care that those variables are not referenced ever by anything outside that function scope.
For the particular RooFit classes, maybe @jonas will have an idea on how to help more concretely.
Ah yes, that’s a classic. The RooAbsData::reduce() method returns the dataset by pointer, and it has to be manually deleted on the C++ side. PyROOT doesn’t know that it has to do that, because it’s not clear to know it automatically from the C++ interface. And we can’t easily change the C++ interface to smart pointers because of backwards incompatibility…
The solution is to tell PyROOT that it owns the returned object and needs to delete it:
another = dataset.reduce ( "Mass<100" )
ROOT.SetOwnerhip(another, True)
Does this work? I’ll also update PyROOT to do this automatically when you create a dataset with reduce().
I have several appearence of such pattern (creation of RoDataSet in loop via variosu methods - not only reduce - e.g. for pseudioexperiments)
ROOT.SetOwnership trick works for some ROOT(Python?) versions - In my taste I run over LCG_102 -…-LCG_108-..-dev3 slot, ROOT 6.26-6.39). I’ve found that explicit instatiation of std::unique_ptr works almost for all ROOT versions
data_ptr = std.unique_ptr(ROOT.RooabsData)
...
for .... :
ds = data_ptr ( dataset.reduce (... ) )
...
del ds
such construction works for all ROOT versions except 6.32-6.34
Of source I’d like to have some “universal” solution, avoiding “if”-branches on ROOT versions
That’s quite surprising! I have no answer to that yet. Can you create an overview on which method works for which ROOT version?
For the unique_ptr: maybe it helps to create it with template instantiations using the bracket operator: std.unique_ptr[ROOT.RooAbsData], because this is less ambiguous.
I’ve repeated my tests with ROOT.SetOwnership and std::unique_ptt
Good news: if the dataset inside the loop is created using reduce both methods work os for all ROOT version between 6.26 and 6.39
If the dataset inside the loop is prepare as “Jeckknife”, namely
original_dataset = ...
for i in range ( len ( original_dataset ) ) :
ds1 = original_dataset.reduce ( ROOT.RooFit.EventRange ( ... ))
ds2 = original_dataset.reduce ( ROOT.RooFit.EventRange ( ... ) )
ds = <merge ds1 and ds2>
...
del ds1
del ds2
del ds
for such scenario, BOTH methods fail for ROOT verisions 6.32-6.34, while are OK for other ROOT versions. And the amount of “memory leak” corresponds to ds (ds1 and ds2 are ok)
Note that a still brutal but maybe more “standard-looking” approach could be to call the obj.__destruct__() method which is available to any Python proxy, see Classes — cppyy 3.5.0 documentation. I still believe Jonas’s approach is the correct one, but it won’t be automatically available to you retroactively in all older ROOT releases.
I’ve played with several combinations of __destruct__` , Delete, and SetOwnership
As noted earler, the actual action heavily depends of ROOT version.
Surprisingly (at least for me) __destruct__ action is not equal to Delete action.
The solution suggected by Jonas apepars to be the most useful however I need to add addtional __destruct__ call.
My final solution (more complicated that toy example above, but still simplified versus the actual code) is here
I believe you are still mixing two different things in your example. The Pythonization concept introduced by Jonas ensures there is no memory leak of the C++ allocated objects at the end of the Python application. But, attempts of freeing memory at specific points of your applications, e.g. obj.Delete() or obj.__destruct__() are going beyond the standard Python behaviour. In Python, there is no guarantee of when the garbage collector will run and thus there is no guarantee regarding when your memory will be freed. Thus, when you call if delete : result .__destruct__ () in line 90 of your gist, you are arbitrarily using a feature of cppyy, and this is not something you should expect your Python application to handle automatically in any case, whether you are using ROOT or any other Python package.
But finaly I need some way to ensure that for large loops large datasets created internally are somehow deleted. In my case datasets and loops are really large - without some actions I easyley get many gigabytes of memory.
Apologies if I gave the impression that your code was somehow wrong. It isn’t wrong per se. I just wanted to point out that the need to deallocate memory in a Python application at a very specific line of code is just not part of the language. In your case, you have this need, and you use a public API of a package that allows to accomplish what you need, thus you are actively using a feature of the package. Nothing wrong with that.
Perhaps you could try to wrap the parts of the loop that allocate memory in a function scope, and see if the garbage collector does a better job without the need to explicitly access the feature of using __destruct__. This might work since if the variable in a function scope has no other references in the Python application then the garbage collector should delete it. But this wouldn’t be any more correct than what you are currently doing.