Large Tree-based RooDataSet

I am trying to create rather large RooDataSet. Documentation says that
in this case I need to rely on the TreeDataStore. But it seems to me that is ti not the case
A simple script

import ROOT

rfile = ROOT.TFile('/tmp/ibelyaev/test.root','RECREATE' )
ROOT.RooAbsData.setDefaultStorageType(ROOT.RooAbsData.Tree)
vars = [ ROOT.RooRealVar('x%d' % i , '' , -100 , 100 ) for i in range ( 10000 ) ]
vset = ROOT.RooArgSet()
for v in vars : vset.add ( v )
data = ROOT.RooDataSet('data','data', vars)

for i in range ( 10000 ) :
    data.add ( vars ) 
    if 0 == i % 1000 :
        store = data.store ()
        tree  = store.tree () 
        print ( '%d' % i , type(store) , tree.GetDirectory() ) 
        rfile.ls() 
  

The output is:

0 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
1000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
2000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
3000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
4000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
5000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
6000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
7000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
8000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	
9000 <class cppyy.gbl.RooTreeDataStore at 0x556b54c09550> <cppyy.gbl.TDirectory object at 0x(nil)>
TFile**		/tmp/ibelyaev/test.root	
 TFile*		/tmp/ibelyaev/test.root	

you see - even for very large number of entries, I see no sign of “backup” - the file is essentially empty, the Tree and the whole dataset resides in memory.
Have I misunderstood the documentation?

Dear Vanya,

the internal TTree is not attached to the TFile directory. Therefore, it will only appear is you put the RooDataSet into a RooWorkspace and then write that one to the file. Not sure if you can write the RooDataSet to the file without wrapping it in a RooWorkspace even.

Is that a problem for your usecase?

Cheers,
Jonas

Hi Jonas,
Actually I am trying to solve different issue.
But the behaviou you descrie is different fro the lines in documentation.
Should the documentation be fixed?

Ok, I see what you mean now. Indeed, there was a change a few years ago that resulted in the documentation to diverge:

I guess by fixing this ownership problem, this part of the documentation is invalidated as you say:

Uses a TTree, which can be file backed if a file is opened before creating the dataset. This significantly reduces the memory pressure, as the baskets of the tree can be written to a file, and only the basket that’s currently being read stays in RAM.

I don’t know what’s the best way to proceed. Could be fixing the docs, could also be making sure that the TTree is managed by the file so that the TTreeDataStore actually fulfills its purpose.

What do you think? What is your exact usecase? Do you have a dataset that is so large that it doesn’t fit in RAM?

The reason why we recommend to use the TTree store for large datasets is that otherwise they can’t be written to a ROOT file. It’s not about the memory limitation anymore. Or is that a limitation in your case?

Thanks for continuing the discussion,
Jonas

Dear @jonas
I think the firtst one needs to ensure the consistence between the code and documentation.
Most likely for this purpose it is enough to fix the documentation.

Actually my case is a derivative from a bit different issue.
I’ve actually tried to find the best way to convert RooDataSet into TTree in the most optimal way
(keeping into account that the datatset may be large). Any advice ?
I’ve tried to find a solution of this
problem and observed that code exaple from documentation does not prodice TTree in TFile.

I have suggested changes to the docs. You have any comments about is?

I’ve actually tried to find the best way to convert RooDataSet into TTree in the most optimal way
(keeping into account that the datatset may be large). Any advice ?

I mean the obvious way is to use the tree-based data store as you already use, and then get the TTree with RooAbsData::tree(). Does this not work?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.