Byte count too large

philtk74 · January 31, 2022, 2:39pm

Dear experts,

I use the framework TRExFitter which is based on RooFit. I experience a problem which I already faced last December but went away using a newer ROOT version. When fitting the workspace I directly get the error “Byte count too large”. However, I cannot find back a version of my setup that works. This issue is coming from the 1GB memory limit of single entities in ROOT (still wondering why this is a thing with nowadays memory availability).

Already after creating the workspace, I cannot find out which part of the workspace is too large as changing to the RooWorkspace already fails:

root [2] _file0->Map();
20220130/173113 At:100 N=570 TFile
20220131/110411 At:670 N=155 TProcessID
20220131/110432 At:825 N=125232782 RooWorkspace CX = 9.33
20220131/110433 At:125233607 N=13277 StreamerInfo CX = 3.45
20220131/110433 At:125246884 N=415 KeysList
20220131/110433 At:125247299 N=281 FreeSegments
20220131/110433 At:125247580 N=1 END
root [3] combWS->cd();
Error in TExMap::Remove: key 1010819072 not found at 3717802
Warning in TBufferFile::CheckObject: reference to object of unavailable class TObject, offset=1010819072 pointer will be 0
Error in TExMap::Remove: key 81920 not found at 81921

I uploaded the workspace here CERNBox, maybe that helps. I still don’t understand why this workspace has an issue with the 1GB memory limit as the RooWorkspace is smaller than this size. Any fast and good solution would be very much appreciated because this memory problem prevents us from making any progress with the analysis.

Thank you very much!
Best regards,
Philipp

ROOT Version: 6.24.06
Platform: CentOS7

moneta · January 31, 2022, 5:29pm

Hi,
This is an hard limit in the ROOT I/O and at the moment we don’t have a solution for this. Your workspace in disk can be compressed so it could have a reduced size.
A possible solution is maybe to create separate workspace for different pdf components and then combined them later in memory in a larger pdf.
Another solution is to minimise the amount of information stored in the workspace. Is it all needed ? Are you using some custom classes ? Are you sure that these classes do not contain some redundant information that could be easily be re-computed when reading the workspace ?

Unfortunately I cannot read (due to an I/O error) your linked workspace.

Cheers

Lorenzo

philtk74 · January 31, 2022, 6:16pm

Hi Lorenzo,
thank you very much for your answer. I gave the part concerning the workspace handling back to the TRExFitter developers. Yes, the linked workspace cannot be opened properly but I thought there are some tools to investigate why this is the case. I will keep trying to get a slightly minimal fit setup that still works.
Cheers,
Philipp

philtk74 · February 1, 2022, 7:07am

Hi,
so I added a somehow smaller workspace to the same location CERNBox, the broken ws is called ws_combined_broken and the working one ws_combined. If someone could have a look or have any other idea how to solve this problem, I would really appreciate.
In terms of splitting and combining the workspaces, I already did my best what is possible inside TRExFitter.
Thanks and cheers,
Philipp

moneta · February 1, 2022, 9:07am

Hi,
Thank you for the workspace. I can read now the small one. I see it contains a huge number of RooHIstFunc classes and RooDataHist’s. I think there is a lots of overhead with these classes and if you have a lot of them you can then end up reaching the 1GB limit.
I see as only workaround that you store only the corresponding histograms in a ROOT File and you create the HistFactory model in memory without saving it in a ROOT file.
Does it take a long time to create the model ?

Cheers

Lorenzo

philtk74 · February 1, 2022, 10:37am

Hi Lorenzo,
thank you very much for looking into this. So did I understand correctly that the RooWorkspace cannot get larger than 1GB?
Cheers,
Philipp

moneta · February 1, 2022, 10:51am

Hi,
The workspace itself can be larger than 1GB in memory, but when you will not be able to store and read from a file, because of this limitation of the ROOT I/O.
The workspace needs to be written as a single buffer in order to maintain correctly all the object cross-references and avoid duplications.
We will work for the next ROOT release on a new implementation of the RooHistFunc to be used by the HIstFactory which will contain the minimal information.

Cheers
Lorenzo

philtk74 · February 1, 2022, 11:00am

Hi Lorenzo,
ok but if i map the broken workspace like I described in one of my earlier messages

root [2] _file0->Map();
20220130/173113 At:100 N=570 TFile
20220131/110411 At:670 N=155 TProcessID
20220131/110432 At:825 N=125232782 RooWorkspace CX = 9.33
20220131/110433 At:125233607 N=13277 StreamerInfo CX = 3.45

I assume that N gives me the size of the RooWorkspace in bytes. But this is well below the 1GB limit with 125 MB or can this also be compressed in any way?

Thank you!
Cheers,
Philipp

moneta · February 1, 2022, 11:15am

Hi ,
As you see in the printout you have 125MB but compressed with a factor CX=9.33.
And 9.33*125 is over than 1GB !

The fact that the compression factor is so high is clear an indication that there is a lot of useless stored information!

philtk74 · February 1, 2022, 12:10pm

Hi Lorenzo,
ah ok I understand, I was wondering what this CX value is for. So by looking into the workspace it is not obvious to me which information is redudant. Is there a way to find out which information is compressed?
Cheers,
Philipp

moneta · February 1, 2022, 3:48pm

Hi,
I am not sure about this, because all the workspace is written in a single buffer and it is not split. Maybe @pcanal, our ROOT I/O expert knows more about this.

Cheers

Lorenzo

philtk74 · February 2, 2022, 8:54am

Hi,
yes it would be great to get some help here!
Thanks and cheers,
Philipp

olifre · February 3, 2022, 1:55pm

Hi,

I may be too unaware of RooWorkspace internals, but hitting this limit has piqued my curioisity. Isn’t the general solution in ROOT I/O to use multiple buffers and have objects reference each other via TRef / TRefArray links? For non-Roo TObjects, I have been using this approach to great success in AOD-like root files.

Or is the reason for using a single buffer historically grown in RooWorkspace (and can’t be changed without a complex schema migration / breaking backwards-compatibility)?

Cheers,
Oliver

Axel · February 7, 2022, 9:15am

This is in 2022’s plan of work; it’s really a limitation of TBufferFile not being able to store one object > 1GB. (@olifre this doesn’t involve TTree and TRef here; TTree knows how to split things.)

If you do

root ws_combined.root
root [0] gDebug=7
root [1] combWS->ls()

you will see all the I/O operations that this triggers: it’s a lot. That shows the number of (uncompressed) bytes read for each object / type, and might help guide where bytes can be saved.

That said, I doubt this will help much; in the end a large workspace will continue to be > 1GB and we need the proper fix by @pcanal - which indeed is major lifting, also conceptual lifting, to make sure old ROOT versionbs properly choke on a file with a TBufferFile > 1GB and don’t read junk, happily. Just to describe one of the challenges here

system · February 21, 2022, 9:15am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.