Possible issue when cloning trees

Dear experts,

The post is a bit long, since the issue is somewhat convoluted.
I’m having a very weird issue when trying to clone a TTree and modify its content before saving it. The full application is crashing when when I do TTree::Fill() on the clone tree with the following stack trace

The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
preferably using the command (.forum bug) in the ROOT prompt.
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs or (preferably) using the command (.gh bug) in
the ROOT prompt. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007f5519351c75 in int TStreamerInfo::WriteBufferAux<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libRIO.so
#6  0x00007f55191dcbfc in TStreamerInfoActions::VectorLooper::GenericWrite(TBuffer&, void*, void const*, TStreamerInfoActions::TLoopConfiguration const*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libRIO.so
#7  0x00007f55190cbe34 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*, void*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libRIO.so
#8  0x00007f551852d12e in TBranchElement::FillLeavesCollectionMember(TBuffer&) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#9  0x00007f5518524a26 in TBranch::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#10 0x00007f551852ec2d in TBranchElement::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#11 0x00007f551852e977 in TBranchElement::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#12 0x00007f551852e977 in TBranchElement::FillImpl(ROOT::Internal::TBranchIMTHelper*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#13 0x00007f55185a4757 in TTree::Fill() () from /cvmfs/sft.cern.ch/lcg/views/LCG_106b/x86_64-el9-gcc13-opt/lib/libTree.so
#14 0x00000000004da26b in NA62Reconstruction::ProcessEventWrapper (this=0x3010ce0, iData=..., currMap=...) at /home/stefan/NA62/MC_Validation/na62fw/NA62Reconstruction/src/NA62Reconstruction.cc:1010
#15 0x00000000004d9cc9 in NA62Reconstruction::NextEvent (this=0x3010ce0) at /home/stefan/NA62/MC_Validation/na62fw/NA62Reconstruction/src/NA62Reconstruction.cc:913
#16 0x000000000045c084 in main (argc=12, argv=0x7fffa2696188) at /home/stefan/NA62/MC_Validation/na62fw/NA62Reconstruction/NA62Reco.cc:199
===========================================================

Out application is huge so I tried composing a MWE, which is attached. However, it does not crash. Still, I noticed something very odd. When reading a cloned tree, sometimes the ClassName() method of some of the read objects is wrong. I’m posting this here as maybe it helps elucidating the issue also of the crash.

The MWE has 2 parts. In the first part, a tree with 6 branches is created. The branches are made from different classes AEvent1...AEvent6, each holding a std::vector<Double_t>. In the second part, this tree is read via TChain, cloned, modified at each entry and finally written to another file. Instructions to run the MWE (cling) and reproduce the class name issue.

$ root -l 
root [0] .L  clone_mwe.cc
root [1] do_stuff() // this creates a TTree called ttree
root [2] copytree3() 

The last command above outputs in my case

Writting 0/100
AEvent1 AEvent2 AEvent3 AEvent4 AEvent5 AEvent6 
AEvent1 AEvent2 AEvent3 AEvent1 AEvent5 AEvent6 
...

Notice how after the first event the name of the 4th class is different. Of course, I cannot be sure this issue causes also the crash. However, in my application, the crash occurs always in an event where the names are wrong.

clone_mwe.cc (7.0 KB)

Any help on this would be greatly appreciated.

Regards,
Stefan,
NA62 software coordinator


Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: 6.32.08 and 6.26.08
Platform: Almalinux 9 and Ubuntu 24.04
Compiler: gcc


Some clarifications: in the MWE I tried to replicate the main steps of how we’re reading/cloning/modifying the trees. I believe we’re reading the original tree in a non-standard way: we’re setting the branch addresses each event, reading into temporary objects that are deleted at the end of the event. The decision for this implementation was driven by the need to have a multithreaded application.

Update: doing
chain->ResetBranchAddresses()
on the original chain/tree at the beginning of each event and then setting the branch addresses seems to solve both issues.

Hello Stefan,

I believe this is exactly what needs to be done. If you bind the branch addresses to temporary objects, the chain will try writing into those when you move to the next event. So you indeed have to reset all branch addresses before deleting those objects.
@pcanal or @vpadulan would be able to confirm this, probably.

Thanks for the answer. The curious thing is that everything works fine without doing chain->ResetBranchAddresses() as long as the input tree is not cloned. To be more specific, commenting the cloning/filling lines in the mwe, but still doing SetBranchAddress on the input tree at the beginning of each event, then deleting the temporary objects into which the entries are read, works perfectly fine, I do not get the weird output.

I will test if ResetBranchAddresses has an impact on multithreaded operation as we’ve implemented it (I expect it is incompatible, judging from the documentation).

It is safe to say that direct manipulation of SetBranchAddress in a multithreaded context is dangerous. RDataFrame internally creates one TTreeReader per thread to avoid surprises in object lifetimes or addresses.

Noted. Thanks again.

I have an update. It seems that doing ResetBranchAddresses() works as long as the input chain contains only one file. I’ve attached an updated version of the mwe. The instructions to reproduce the problem are:

$ root -l 
root [0] .L  clone_mwe.cc
root [1] do_stuff() // this creates a file called anEventTree.root with a TTree called ttree
root [2] .q
$ cp anEventTree.root anEventTree2.root # needed for the second step
$ root -l 
root [0] .L clone_mwe.cc
root [1] copytree3() 

On my machine the above does not crash, but the output tree, instead of having 200 entries per branch has 300 and the last 100 entries (corresponding to the second input file) do not contain the correct values. The difference is clearly seen by commenting the line chain->AddFile(filename2, TTree::kMaxEntries, treename); in the updated mwe and comparing to the result from the above recipe.

clone_mwe.cc (7.2 KB)

Again, any help on this would be greatly appreciated.

I think I managed to reproduce the actual problem we have in NA62. I’m uploading a version of the mwe that leads to a crash in the third event read from the second tree.
clone_mwe.cc (7.2 KB)
The only practical difference is in how the addresses are set

chain->SetBranchAddress(...); // mwe of the current message
chain->GetTree()->SetBranchAddress(); //mwe of previous messages

Your script also crashes for me on ROOT 6.36, so maybe it’s worth to open a bug report.
Maybe related: Mitigate TChain with no files and SetBranchAddress call by vepadulano · Pull Request #19312 · root-project/root · GitHub by @vpadulan recent PR might have maybe changed sth here? @pcanal might also know.

#11 0x00007f0e3820d5b1 in TStreamerInfoActions::WriteBasicType<unsigned int> (buf=..., addr=0x6, config=0x564eadb07dd0) at /opt/root_src/io/io/src/TStreamerInfoActions.cxx:386
#12 0x00007f0e380ccd7f in TStreamerInfoActions::TConfiguredAction::operator() (this=0x564eadb07da0, buffer=..., object=0x6) at /opt/root_src/io/io/inc/TStreamerInfoActions.h:123
#13 0x00007f0e380ca8f9 in TBufferFile::ApplySequence (this=0x564eadafb650, sequence=..., obj=0x6) at /opt/root_src/io/io/src/TBufferFile.cxx:3747
#14 0x00007f0e1d4685fb in TBranchElement::FillLeavesMember (this=0x564eadad1700, b=...) at /opt/root_src/tree/tree/src/TBranchElement.cxx:1764
#15 0x00007f0e1d456441 in TBranch::FillImpl (this=0x564eadad1700, imtHelper=0x0) at /opt/root_src/tree/tree/src/TBranch.cxx:893
#16 0x00007f0e1d466caf in TBranchElement::FillImpl (this=0x564eadad1700, imtHelper=0x0) at /opt/root_src/tree/tree/src/TBranchElement.cxx:1266
#17 0x00007f0e1d466e4c in TBranchElement::FillImpl (this=0x564eadad12e0, imtHelper=0x0) at /opt/root_src/tree/tree/src/TBranchElement.cxx:1291
#18 0x00007f0e1d466e4c in TBranchElement::FillImpl (this=0x564eadad0ec0, imtHelper=0x0) at /opt/root_src/tree/tree/src/TBranchElement.cxx:1291
#19 0x00007f0e1d466e4c in TBranchElement::FillImpl (this=0x564eadaf2060, imtHelper=0x0) at /opt/root_src/tree/tree/src/TBranchElement.cxx:1291
#20 0x00007f0e1d4f5b1b in TTree::Fill (this=0x564ea670b190) at /opt/root_src/tree/tree/src/TTree.cxx:4690

The pattern:

        for (auto &el : bufStuf->evMap)
        {
            chain->SetBranchAddress(branchNamesMap[el.first].Data(), &(el.second));
        }

is unfortunately wrong. SetBranchAddress is recording the address (to a pointer) passed as the 2nd argument literally. In particular here, it record the address of the second data member of the temporary std::pair. Besides that the same address is likely passed for every branch, the real issue is that that address is invalid as soon as the for loop ends.

The following (even-though it looks ineffecient) should work:

        for (auto &el : bufStuf->evMap)
        {
            chain->SetBranchAddress(branchNamesMap[el.first].Data(), &( bufStuf->evMap[el.first] ) );
        }

As a aside, it looks like this 'simplification will also work (set the address outside the loop):

    std::map<Int_t, BaseEvent *> mapToCopy;
    mapToCopy[0] = nullptr; // aevent
    mapToCopy[1] = nullptr; // bevent
    mapToCopy[2] = nullptr; // aevent
    mapToCopy[3] = nullptr; // bevent
    mapToCopy[4] = nullptr; // aevent
    mapToCopy[5] = nullptr; // bevent

    BufferStuff *bufStuf = new BufferStuff();
    bufStuf->evMap = mapToCopy;

    for (auto &el : bufStuf->evMap)
    {
          chain->SetBranchAddress(branchNamesMap[el.first].Data(), &(el.second));
    }

    for (auto i : ROOT::TSeqI(nentries))
    {
        if (i % (nentries / 10) == 0)
        {
            std::cout << "Writting " << i << "/" << nentries << std::endl;
        }

        chain->GetEntry(i);
        for (int i = 0; i < 6; ++i){
            std::cout << bufStuf->evMap[i]->ClassName() << " ";
        }
        std::cout << std::endl;
        (static_cast<AEvent1 *>(bufStuf->evMap[0]))->value.clear();
        (static_cast<AEvent1 *>(bufStuf->evMap[0]))->value.push_back(1.);
        for (std::size_t is = 0; is < (static_cast<AEvent1 *>(bufStuf->evMap[0]))->value.size(); ++is){
            std::cout << (static_cast<AEvent1 *>(bufStuf->evMap[0]))->value.at(is) << std::endl;
        }

        (static_cast<AEvent2 *>(bufStuf->evMap[1]))->value.clear();
        (static_cast<AEvent2 *>(bufStuf->evMap[1]))->value.push_back(2.);
        (static_cast<AEvent3 *>(bufStuf->evMap[2]))->value.clear();
        (static_cast<AEvent3 *>(bufStuf->evMap[2]))->value.push_back(3.);
        (static_cast<AEvent4 *>(bufStuf->evMap[3]))->value.clear();
        (static_cast<AEvent4 *>(bufStuf->evMap[3]))->value.push_back(4.);
        (static_cast<AEvent5 *>(bufStuf->evMap[4]))->value.clear();
        (static_cast<AEvent5 *>(bufStuf->evMap[4]))->value.push_back(5.);
        (static_cast<AEvent6 *>(bufStuf->evMap[5]))->value.clear();
        (static_cast<AEvent6 *>(bufStuf->evMap[5]))->value.push_back(6.);

        newtree->Fill();
    }
    chain->ResetBranchAddresses();
    delete bufStuf;
    bufStuf = nullptr;

As another aside, in the original code (even after fixing the call to SetBranchAddress), the call to ResetBranchAddress is misplaced and instead of being done at the beginning of the loop it must be done before the address are invalidated. i.e, it needs to be like this:

        newtree->Fill();
        chain->ResetBranchAddresses();
        delete bufStuf;
        bufStuf = nullptr;

Thank you for your answer! However, the problem seems to persist. First of all, we have the SetBranchAddress inside the for loop on purpose, to replicate what is done in our framework, where we rely on multithreaded reading of the input tree with a complicated structure.
In any case, doing

for (auto &el : bufStuf->evMap)
{
  chain->SetBranchAddress(branchNamesMap[el.first].Data(), &( bufStuf->evMap[el.first] ) );
}

leads to the same crash.

Also, together with the for loop modified as suggested, I tried to move the ResetBranchAddresses as indicated. Now the code crashes in the third event read. I am attaching the mwe.
clone_mwe.cc (9.8 KB)

Just to clarify, I am of course willing to accept we (i.e. in NA62) are going about this problem the wrong way. It is still puzzling that one can read a TChain, using multithreading, as I described as long as the chain consists of only one file, but trying to read multiple files leads to a crash.

This being said, I would also like to ask in parallel, what would be the proper way to correct this? Assuming, as I wrote above, that the input structure I am reading is quite complex and I want to be able to handle multiple threads.

Humm .. what do you mean by ‘read multi-thread’, in first approximation without proper locks around SetBranchAddress and GetEntry, reading from multiple thread from the same TChain object is not supported (it as internal state that without the lock will of course be confused by the multithread access to the same TChain object).

Now the code crashes in the third event read.

So indeed there is a bug in ResetBranchAddresses, I opened the issue at TTree/TChain ResetBranchAddress does not propagate the reset to the clones · Issue #19402 · root-project/root · GitHub

To work around the issue use both:

   chain->ResetBranchAddresses();
   newtree->ResetBranchAddresses();

Note that until the next full set of SetBranchAddress after these 2 calls the chain and the new tree are no longer properly connected (in you context that should not be an issue).

In example provided I also noticed the 2 accurate warnings:

Warning: Unused class rule: AnEvent
Warning in <BaseEvent>: The data members of BaseEvent will not be stored, because it inherits from TObject but does not have its own ClassDef.

The first one is clearly an artefact of the test. The 2nd may or may not affect your production code.

Thank you very much! Indeed, using both commands and placing them (as indicated) before the address is invalidated solves the issue!

Dear @sghinesc ,

Thank you for reaching out to the forum! While we work on the bug identified by @pcanal, I’m curious to understand your use case better and if RDataFrame could be of help for NA62. We have seen this tool in action in many in-production use cases with great results. If you think it could be interesting, we could even discuss it in a separate meeting, let me know.

Cheers,
Vincenzo