How to clean/delete properly RooDataSet in PyROOT?

What is the correct/recommended way to clean/delete RooDataSet in PyROOT ?

I have a use case where relatively large dataset is created and used inside the loop. and I see that memory is not free after the loop.


import ROOT, os, psutil, random 

# inner psutil function
def process_memory():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss

evt     = ROOT.RooRealVar ( 'Evt'    , '#event'        , 0 , 1000000 )
run     = ROOT.RooRealVar ( 'Run'    , '#run'          , 0 , 1000000 )
mass    = ROOT.RooRealVar ( 'Mass'   , 'mass-variable' , 0 , 100     )

varset  = ROOT.RooArgSet  ( evt , run , mass )
dataset = ROOT.RooDataSet ( "dataset", 'Test Data set-0' , varset )  

before_create = process_memory ()
print ( 'MEMORY BEFORE CREATION OF DATASET %.1fMB' % ( before_create / 2**20 ) ) 

NR = 1000
NE = 1000

for r in range ( NR ) :
    run.setVal ( r )
    for e in range ( NE ) :        
        evt .setVal   ( e )        
        mass.setVal   ( random.uniform      ( 0  , 10  ) )
        dataset.add ( varset )

before_loop = process_memory ()
print ( 'DELTA MEMORY (BEFORE LOOP) %.2fMB' % ( ( before_loop - before_create ) / 2**20  ) ) 

NL = 100 
for i in range ( NL ) :

    ## some actinon to create another large dataset, e,g, resuction
    ## in my concreete case I create the boostrapped-dataset 
    another = dataset.reduce ( "Mass<100" ) 

    ## (1) 
    ## del another       ## No effect at all 

    ## (2) 
    ## another.Delete()  ## it works, but looks a bit brutal 

    ## (3)   -- no effect 
    ## store = another.store()
    ## if store :
    ##    store.reset        ()
    ##    store.resetBuffers ()
    ##    store.resetCache   ()        
    ## another.resetBuffers   ()
    ## another.reset          ()
     
    
after_loop = process_memory ()
print ( 'DELTA MEMORY (AFTER  LOOP) %.2fMB' % ( ( after_loop - before_loop ) / 2**20  ) )
    

 
  • simple delete another inside the loop has no effece
  • `another.Delete()` does work but cout it be a recomemnded way?
  • manual reset of store and dataset does report that another has no events enynmore, but memory is not free.

Dear @ibelyaev ,

Thank you for reaching out to the forum! Indeed I do not expect any del call in Python to have any immediate or predictable effect, that’s just how the CPython garbage collector works. A somewhat more reliable way would be to restrict variables allocating a lot of memory to a free function scope, taking care that those variables are not referenced ever by anything outside that function scope.

For the particular RooFit classes, maybe @jonas will have an idea on how to help more concretely.

Cheers,
Vincenzo

Ah yes, that’s a classic. The RooAbsData::reduce() method returns the dataset by pointer, and it has to be manually deleted on the C++ side. PyROOT doesn’t know that it has to do that, because it’s not clear to know it automatically from the C++ interface. And we can’t easily change the C++ interface to smart pointers because of backwards incompatibility…

The solution is to tell PyROOT that it owns the returned object and needs to delete it:

    another = dataset.reduce ( "Mass<100" )
    ROOT.SetOwnerhip(another, True)

Does this work? I’ll also update PyROOT to do this automatically when you create a dataset with reduce().

1 Like

Hi @jonas

Thank you for solution.

I have several appearence of such pattern (creation of RoDataSet in loop via variosu methods - not only reduce - e.g. for pseudioexperiments)

ROOT.SetOwnership trick works for some ROOT(Python?) versions - In my taste I run over LCG_102 -…-LCG_108-..-dev3 slot, ROOT 6.26-6.39). I’ve found that explicit instatiation of std::unique_ptr works almost for all ROOT versions

data_ptr = std.unique_ptr(ROOT.RooabsData)

...

for ....   :

   ds = data_ptr ( dataset.reduce (... ) ) 
   ...
   del ds 

 

such construction works for all ROOT versions except 6.32-6.34

Of source I’d like to have some “universal” solution, avoiding “if”-branches on ROOT versions

That’s quite surprising! I have no answer to that yet. Can you create an overview on which method works for which ROOT version?

For the unique_ptr: maybe it helps to create it with template instantiations using the bracket operator: std.unique_ptr[ROOT.RooAbsData], because this is less ambiguous.

Hi Jonas,

I’ve repeated my tests with ROOT.SetOwnership and std::unique_ptt

  • Good news: if the dataset inside the loop is created using reduce both methods work os for all ROOT version between 6.26 and 6.39
  • If the dataset inside the loop is prepare as “Jeckknife”, namely
 original_dataset = ...
 for i in range ( len ( original_dataset ) ) :
       ds1 = original_dataset.reduce ( ROOT.RooFit.EventRange  ( ... ))  
       ds2 = original_dataset.reduce ( ROOT.RooFit.EventRange  ( ... ) ) 
       
       ds = <merge ds1 and ds2>

       ...
       del ds1
       del ds2 
       del ds   

for such scenario, BOTH methods fail for ROOT verisions 6.32-6.34, while are OK for other ROOT versions. And the amount of “memory leak” corresponds to ds (ds1 and ds2 are ok)

Thanks for the details! How do you <merge ds1 and ds2> though?

 result = ds1.emptyClone( <unique name> ) 
 result.append ( ds1 )
 result.append ( ds2 )

For ROOT 6.32-6.34 I’ve got success only using a brutal `ds.Delete()`

Dear @ibelyaev ,

Note that a still brutal but maybe more “standard-looking” approach could be to call the obj.__destruct__() method which is available to any Python proxy, see Classes — cppyy 3.5.0 documentation. I still believe Jonas’s approach is the correct one, but it won’t be automatically available to you retroactively in all older ROOT releases.

Cheers,
Vincenzo

Thank you @vpadulan

obj.__destruct__() looks a bit less brutal than obd.Delete()

I can only agree that Jonas’s approach looks better. (Moreover , one can easily hide SetOwnership` into pythonized reduce )

In a meantime I’ll try to use some ifs based on ROOT version

Dear @vpadulan , @jonas

I’ve played with several combinations of __destruct__` , Delete, and SetOwnership

As noted earler, the actual action heavily depends of ROOT version.

Surprisingly (at least for me) __destruct__ action is not equal to Delete action.

The solution suggected by Jonas apepars to be the most useful however I need to add addtional __destruct__ call.
My final solution (more complicated that toy example above, but still simplified versus the actual code) is here

Dear @ibelyaev ,

I believe you are still mixing two different things in your example. The Pythonization concept introduced by Jonas ensures there is no memory leak of the C++ allocated objects at the end of the Python application. But, attempts of freeing memory at specific points of your applications, e.g. obj.Delete() or obj.__destruct__() are going beyond the standard Python behaviour. In Python, there is no guarantee of when the garbage collector will run and thus there is no guarantee regarding when your memory will be freed. Thus, when you call if delete : result .__destruct__ () in line 90 of your gist, you are arbitrarily using a feature of cppyy, and this is not something you should expect your Python application to handle automatically in any case, whether you are using ROOT or any other Python package.

Cheers,
Vincenzo

Dear @vpadulano

Thank you for explanation.

But finaly I need some way to ensure that for large loops large datasets created internally are somehow deleted. In my case datasets and loops are really large - without some actions I easyley get many gigabytes of memory.

what is the proper solution ?

Dear @ibelyaev ,

Apologies if I gave the impression that your code was somehow wrong. It isn’t wrong per se. I just wanted to point out that the need to deallocate memory in a Python application at a very specific line of code is just not part of the language. In your case, you have this need, and you use a public API of a package that allows to accomplish what you need, thus you are actively using a feature of the package. Nothing wrong with that.

Perhaps you could try to wrap the parts of the loop that allocate memory in a function scope, and see if the garbage collector does a better job without the need to explicitly access the feature of using __destruct__. This might work since if the variable in a function scope has no other references in the Python application then the garbage collector should delete it. But this wouldn’t be any more correct than what you are currently doing.

Cheers,
Vincenzo

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.