Error in <TCollectionLessSTLReader::GetCP()>: Read error in TBranchProxy

FoxWise · October 8, 2021, 5:24pm

I have the following code example, which produces me tons of errors:
Error in <TCollectionLessSTLReader::GetCP()>: Read error in TBranchProxy
They begin to pop up at some random event in the middle…

What can be a reason?

import ROOT
ROOT.EnableImplicitMT()

ch = ROOT.TChain("lumical")

ch.Add("/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT.root")
ch.AddFriend("emv=lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMV.root")
ch.AddFriend("emx=lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMX.root")
ch.AddFriend("emy=lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMY.root")
ch.AddFriend("emz=lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMZ.root")

df = ROOT.RDataFrame(ch).Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")

histos = [ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D()]


df = df.Define("y1_default", "mc_cont_posy[layer == 0 && mc_cont_momz < 0.]")
histos[0] = df.Histo1D(("h_default", "default", 32, -90, 35), "y1_default")
for i, t in enumerate(titles[1:]):
    df = df.Define("y1_{}".format(t), "{0}.mc_cont_posy[{0}.layer == 0 && {0}.mc_cont_momz < 0.]".format(t) )
    histos[i+1] = df.Histo1D(("h_{}".format(t), "{}".format(t), 32, -90, 35), "y1_{}".format(t) )

eguiraud · October 8, 2021, 5:29pm

Hi,
there might be some issue with these particular files, can you share one that reproduces the issue? @pcanal do you have any idea what might cause that error?

Cheers,
Enrico

FoxWise · October 8, 2021, 5:30pm

I discover that it runs smoother with less threads… e.g.
ROOT.EnableImplicitMT(5) gives an error around 2 mil event
ROOT.EnableImplicitMT(4) still gives an error but around 4 mil event…
ROOT.EnableImplicitMT(3) still gives an error but around 8 mil event…
ROOT.EnableImplicitMT(2) still gives an error but around 9 mil event…
# ROOT.EnableImplicitMT() makes it to the end

and it feels like time complexity is increasing with event number…

Could it be something to do with file access by different threads?

eguiraud · October 8, 2021, 5:59pm

It certainly looks related. Can you please open an issue at Issues · root-project/root · GitHub providing some way for us to reproduce the problem?

Cheers,
Enrico

FoxWise · October 8, 2021, 6:00pm

Also error does not happen in this variation of the code without Friends:

import ROOT
ROOT.EnableImplicitMT()

df = ROOT.RDataFrame("lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT.root").Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")
df_emv = ROOT.RDataFrame("lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMV.root").Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")
df_emx = ROOT.RDataFrame("lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMX.root").Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")
df_emy = ROOT.RDataFrame("lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMY.root").Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")
df_emz = ROOT.RDataFrame("lumical", "/nfs/dust/ilc/user/dudarboh/final_files/FCAL/tb16/e_FTFP_BERT_EMZ.root").Filter(" rdfentry_ < 9340000").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")

data = [df, df_emv, df_emx, df_emy, df_emz]
titles = ["default", "emv", "emx", "emy", "emz"]
scales = [0., 0., 0., 0., 0.]
histos = [ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D(), ROOT.TH1D()]
colors = [ROOT.kBlack, ROOT.kRed-8, ROOT.kRed+2, ROOT.kGreen+3, ROOT.kBlue]
canvas = ROOT.TCanvas()

# for i, d in enumerate(data):
    # print("n events in", titles[i], d.Count().GetValue() )

for i, (d, t) in enumerate( zip(data, titles) ):
    scales[i] = d.Count()
    d = d.Define("pad1", "mc_cont_posy[layer == 0 && mc_cont_momz < 0.]")
    histos[i] = d.Histo1D(("h_{}".format(t), "{}".format(t), 200, -90, 35),"pad1")

FoxWise · October 8, 2021, 6:01pm

I will try to make independent reproducible and make an issue

FoxWise · October 11, 2021, 2:34pm

@eguiraud I am having some troubles…

When I try to create TTrees with RDataFrame and add them as friends to the chain I encounter:

Error in <AddFriend>: Tree 'test1' has the kEntriesReshuffled bit set, and cannot be used as friend nor can be added as a friend unless the main tree has a TTreeIndex on the friend tree 'test2'. You can also unset the bit manually if you know what you are doing.

I tried something like:

import ROOT
ROOT.EnableImplicitMT()

ROOT.RDataFrame(10000000).Define("x", "gRandom->Rndm()").Snapshot("test1", "test1.root");
ROOT.RDataFrame(10000000).Define("x", "gRandom->Rndm()").Snapshot("test2", "test2.root");

f1 = ROOT.TFile("test1.root")
f2 = ROOT.TFile("test2.root")

t1 = f1.Get("test1")
t2 = f2.Get("test2")

t1.ResetBit(ROOT.TTree.EStatusBits.kEntriesReshuffled)
t2.ResetBit(ROOT.TTree.EStatusBits.kEntriesReshuffled)

ch = ROOT.TChain("test1")
ch.Add("test1.root")
ch.AddFriend("fr=test2", "test2.root")

df = ROOT.RDataFrame(ch).Filter(" rdfentry_ < 999999").Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")

df = df.Define("y", "x*fr.x")
h = df.Histo1D(("h", "", 100, -10, 10), "y")
h.Draw()

but it fails… Any advice on how to compactly add Friends trees produces by the RDataFrame?

I tried to see this, but I find it hard to make useful in my case

eguiraud · October 11, 2021, 2:36pm

I think you’ll have to do this without EnableImplicitMT

To save time you can also Snapshot just one file and then cp it.

FoxWise · October 11, 2021, 5:32pm

I have tried to make a reproducer, but I failed…

Maybe the reason code above fails, because files are quite large by itself… 34-47 GB… While I tried for an example only 3 GB files at max, at the moment… Maybe I am missing something, maybe I don’t… But I think this one is quite hard to catch…

Is there any way I can check what root does internally, while executing my code? Maybe this would help to track it

eguiraud · October 12, 2021, 12:10pm

You can try setting ROOT.gDebug to a high-enough value but this is a problem in TTree/TTreeReader internals, we need either a genius insight by @pcanal or a way to reproduce and investigate on our side. Are the files private? If yes, could you maybe be allowed to duplicate just 1% of that data N times such that the dataset size is equal to the original (but the physics content is all redundant), so that the data can then be shared with us? (or maybe the data could simply be shared with me privately under the agreement that I don’t discover any new physics with it? )

Cheers,
Enrico

FoxWise · October 12, 2021, 1:13pm

There is no “private” issue with sharing the data, but more of a “size” issue, so I didn’t know where to put it.

But I recently realized I can removed all the unused columns except one and issue is still persists.

So now size of the files down to the total ~8GB.

I just need ~1 hour to upload it to the cloud and there will be an issue on the github with links to the files

cheers,
Bohdan

FoxWise · October 13, 2021, 9:13am

Moved to the:

github.com/root-project/root

Error in <TCollectionLessSTLReader::GetCP()> in the multithreaded loop over Friend column with RDataFrame

opened 02:45PM - 12 Oct 21 UTC

dudarboh

bug

- [] Checked for duplicates ### Describe the bug Origin of the issue is …from the forum discussion [here](https://root-forum.cern.ch/t/error-in-tcollectionlessstlreader-getcp-read-error-in-tbranchproxy/47178). During the event loop over the friend `TTree` with `ROOT.EnableImplicitMT()` enabled using `RDataFrame` at some point, many errors pop up: `Error in <TCollectionLessSTLReader::GetCP()>: Read error in TBranchProxy.` I couldn't reproduce this behavior with manually produced fresh root files, so I assume something might have gone wrong with a root file itself during the creation.. However, without using "Friend" `TTree` or with `TTree` with `ROOT.EnableImplicitMT()` disabled, the file is analized correctly without any errors. This makes me wonder is this is a bug, or what is an explanation for such a behavior? ### Expected behavior No errors should appear... ### To Reproduce Run following code snippet: ``` import ROOT # Commenting MT line makes this example work ROOT.EnableImplicitMT() ch = ROOT.TChain("lumical") ch.Add("test_default.root") ch.AddFriend("emv=lumical", "test_emv.root") # Adding test_emv.root as the only one but not as a friend, also works... # ch = ROOT.TChain("lumical") # ch.Add("test_emv.root") h_emv = ROOT.RDataFrame(ch).Filter("if (rdfentry_ % 500000 == 0){cout<<rdfentry_<<endl;} return true;")\ .Histo1D(("h_emv", "emv", 100, -100, 100), "emv.mc_cont_posy" ) h_emv.Draw() input("wait") ``` The links for the root files: https://syncandshare.desy.de/index.php/s/2Lf6469S22sYt3T ### Setup Tested with: `source /cvmfs/sft.cern.ch/lcg/views/LCG_100rc2/x86_64-centos7-gcc10-opt/setup.sh` 1. ROOT version 6.24/00 2. Operating system Centos7 3. Python 3.8.6 ### Additional context After stopping and killing the job which produces errors above, restarting the script can sometimes produce following error: `Error in <TTreeReaderArrayBase::GetBranchAndLeaf()>: The tree does not have a branch called emv.mc_cont_posy. You could check with TTree::Print() for available branches.` Which disappears on the next relaunch...

system · October 27, 2021, 9:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.