Filtering branches from a large root file fails in 6.32


Please read tips for efficient and successful posting and posting code

ROOT Version: 6.32.10
Platform: OSX
Compiler: 16.4 (apple silicon)


I am trying to filter branches from a large file as part of our open data project.

I get stymied by

“Fatal in TBufferFile::WriteFastArray: Not enough space left in the buffer (1GB limit). -1 elements is greater than the max left of 536870889”

BTW, the file I am testing on is < 1G but I have others that are 25 G.

I’ve tried T.SetMaxVirtualSize(size=1E9) as that worked in the distant past.

I’ve tried using TDataFrame - same issue.

How does one tell ROOT 6 to read in in decent size chunks and then make it possible to write them out?

Here is a script in python that reproduces the problem. It is reading from the MINERvA public data release.

import ROOT

from ROOT import  TFile, TTree

import os 





#  an example of copytree.C




#  I had trouble getting this to work on the latest versions of root

#  but didn't try hard to diagnose the problem

#  I run on a recent (c. 2024) version of root.

#  From (Rik Gran)

#  Minimal translated to Python by H. Schellman




infile = "root://fndca1.fnal.gov:1095/pnfs/fnal.gov/usr/minerva/persistent/OpenData/MediumEnergy_FHC/Data/Playlist1N/MasterAnaDev_data_AnaTuple_run00020408_Playlist.root"

f = TFile.Open(infile,"READONLY")




name = f.GetName()

DATA = "Data" in name

MC = "mc" in name

T = f.Get("MasterAnaDev")

if MC: TT = f.Get("Truth")

TM = f.Get("Meta")





#  Needed to oversome some limitation

T.SetMaxVirtualSize(size=1E9)

if MC: TT.SetMaxVirtualSize(size=2E9)

TM.SetMaxVirtualSize(size=1E9)





#T.SetBranchStatus("mc_wgt_*",1)

if MC: TT.SetBranchStatus("mc_wgt_*",1)






#  deactivate select branches

T.SetBranchStatus("lattice*",0)

T.SetBranchStatus("odlattice*",0)

T.SetBranchStatus("recoil_summed_energy*",0)

T.SetBranchStatus("recoil_data_fraction*",0)

T.SetBranchStatus("slice_hit_*",0)

T.SetBranchStatus("recoil*time_limit*",0)

T.SetBranchStatus("event_track_hit*",0) 

T.SetBranchStatus("cluster_*",0)

T.SetBranchStatus("EnergyPoints*",0)

T.SetBranchStatus("VetoWall*",0)

T.SetBranchStatus("part_response*",0)

T.SetBranchStatus("part_response_total_recoil_passive_allNonMuonClusters_id",1)

T.SetBranchStatus("part_response_total_recoil_passive_allNonMuonClusters_od",1)

T.SetBranchStatus("*ichel*",0)

if MC: TT.SetBranchStatus("*ichel*",0)

T.SetBranchStatus("dEdX*",0)

T.SetBranchStatus("ExtraEnergyClusters*",0)




T.SetBranchStatus("prong_part*",0)

T.SetBranchStatus("proton_prong*",0)

T.SetBranchStatus("proton_track*",0)

T.SetBranchStatus("seco_prot*",0)

T.SetBranchStatus("sec_prot*",0)

T.SetBranchStatus("iso_prong*",0)

T.SetBranchStatus("gamma*",0)

T.SetBranchStatus("pi0*",0)

T.SetBranchStatus("disp*",0)

T.SetBranchStatus("blob_nuefuzz*",0)

T.SetBranchStatus("hadron_em*",0)

T.SetBranchStatus("proton_em*",0)

T.SetBranchStatus("phys_energy*",0)

T.SetBranchStatus("recoil_energy_*vtx*",0)

T.SetBranchStatus("clusters_found*",0)

T.SetBranchStatus("number_clusters*",0)

T.SetBranchStatus("shower_*",0)

T.SetBranchStatus("calibE_*",0)

T.SetBranchStatus("hadron_track_*",0)

T.SetBranchStatus("nonvtx_iso*",0)

T.SetBranchStatus("visE_*",0)




T.SetBranchStatus("Signal*",0)

T.SetBranchStatus("ConeEnergyVis",0)

T.SetBranchStatus("ExtraEnergyVis",0)

T.SetBranchStatus("Psi",0)




#  One of the main neutron branches

#  Notice I am adding back branches here in some cases

#  after using a wildcard above

T.SetBranchStatus("MasterAnaDev_Blob*",1)

T.SetBranchStatus("MasterAnaDev_RecoPattern",1)

T.SetBranchStatus("MasterAnaDev_MCEnergyFrac*",0)




T.SetBranchStatus("MasterAnaDev_hadron*",0)

T.SetBranchStatus("MasterAnaDev_sec_prot*",0)

T.SetBranchStatus("MasterAnaDev_pi*",0)

T.SetBranchStatus("MasterAnaDev_prot*",0)




T.SetBranchStatus("MasterAnaDev_prot*",0)

T.SetBranchStatus("MasterAnaDev_sys*",0)




T.SetBranchStatus("numi*",0)

if MC: TT.SetBranchStatus("numi*",0)

T.SetBranchStatus("numi_pot*",1)

if MC: TT.SetBranchStatus("numi_pot*",1)




if MC: T.SetBranchStatus("truth_neutronInelast*",0)

if MC: TT.SetBranchStatus("truth_neutronInelastic*",0)




if MC: T.SetBranchStatus("truth_hadronReweight*",0)

if MC: TT.SetBranchStatus("truth_hadronReweight*",0)




if MC: T.SetBranchStatus("truth_muon_track_cluster*",0)

if MC: TT.SetBranchStatus("truth_muon_track_cluster*",0)




if MC: T.SetBranchStatus("truth_fuzz*",0)

if MC: TT.SetBranchStatus("truth_fuzz*",0)





if MC: T.SetBranchStatus("truth_gamma*",0)

if MC: T.SetBranchStatus("truth_prot*",0)

if MC: T.SetBranchStatus("truth_pi*",0)

if MC: T.SetBranchStatus("truth_muon_off_track*",0)

if MC: TT.SetBranchStatus("truth_gamma*",0)

if MC: TT.SetBranchStatus("truth_prot*",0)

if MC: TT.SetBranchStatus("truth_pi*",0)

if MC: TT.SetBranchStatus("truth_muon_off_track*",0)




T.SetBranchStatus("muon_track_cluster*",0)

T.SetBranchStatus("muon_fuzz_per_plane_r150*",0)





if MC: T.SetBranchStatus("mc_fr*",0)

if MC: TT.SetBranchStatus("mc_fr*",0)




T.SetBranchStatus("muon_thetaX_allNodes",0)

T.SetBranchStatus("muon_thetaY_allNodes",0)

T.SetBranchStatus("muon_theta_allNodes",0)

T.SetBranchStatus("muon_iso_blobs*",0)




file =  TFile("outfile.root","recreate")

tree = T.CloneTree()

if MC: truth = TT.CloneTree()  

meta = TM.CloneTree()




tree.Print()

if MC: truth.Print()

meta.Print()

file.Write(0,ROOT.TObject.kOverwrite)

file.Close()


@pcanal may help

I’ve been playing around with feeding RDataFrame a list of “columns” to keep and can get it to work with a short list. Problem there is that I have 3 Trees (basically for normalization and cross-checks) and the Snapshot method really does not like writing multiple trees back into the same file.

And of course, once I start trying to do a lot of stuff (some of those branches are vectors with 1000 entries) it is likely to get mad at me again.

The input is very odd. Many size indices have (some or all) their values set to -1 which is ‘impossible’/‘unexpected’. This basically tells the I/O that the array have infinite size !?

So I am not sure how this file was produced but you have you have to work-around this challenge by disabling all the related branches:

# n_indices, n_Signalindices, n_odindices, n_odindices2
# n_overlayindices
# are -1 and thus the branches that users them as an indices can
# be read/stored.

T.SetBranchStatus("SignalModInfo", 0)
T.SetBranchStatus("SignalPlaneInfo", 0)
T.SetBranchStatus("SignalStripInfo", 0)
T.SetBranchStatus("SignalEnergyInfo", 0)

T.SetBranchStatus("latticeEnergyIndices", 0)
T.SetBranchStatus("latticeOverlay", 0)
T.SetBranchStatus("latticeodOverlay", 0)
T.SetBranchStatus("latticeodOverlay2", 0)
T.SetBranchStatus("latticeNormEnergySums", 0)
T.SetBranchStatus("latticeRelativeTimes", 0)

T.SetBranchStatus("odlatticeEnergyIndices", 0)
T.SetBranchStatus("odlatticeEnergyIndices2", 0)
T.SetBranchStatus("odlatticeNormEnergySums", 0)
T.SetBranchStatus("odlatticeNormEnergySums2", 0)
T.SetBranchStatus("odlatticeRelativeTimes", 0)
T.SetBranchStatus("odlatticeRelativeTimes2", 0)

T.SetBranchStatus("overlayModInfo", 0)
T.SetBranchStatus("overlayPlaneInfo", 0)
T.SetBranchStatus("overlayStripInfo", 0)
T.SetBranchStatus("overlayEnergyInfo", 0)

See:

root [0] TFile *_file0 = TFile::Open("root://fndca1.fnal.gov:1095/pnfs/fnal.gov/usr/minerva/persistent/OpenData/MediumEnergy_FHC/Data/Playlist1N/MasterAnaDev_data_AnaTuple_run00020408_Playlist.root")
(TFile *) 0x137f75f60
root [1] MasterAnaDev->Scan("n_odindices2")
************************
*    Row   * n_odindic *
************************
*        0 *        -1 *
*        1 *        -1 *
*        2 *        -1 *
*        3 *        -1 *
*        4 *        -1 *
....
root [2] MasterAnaDev->Scan("n_Signalindices")
************************
*    Row   * n_Signali *
************************
*        0 *        -1 *
*        1 *        -1 *
*        2 *        -1 *
*        3 *        -1 *
*        4 *        -1 *
....
etc.

Thanks, I will try that.

Good thing that I made the actual file the example. I will report back to the authors and try to provide a patch for others reading these files as I don’t think we can regenerate them.

Heidi Schellman

Oregon State Physics