Issue when working with multiple treees and root files, brnach pointers not correctly assigned

In my analysis I need to open a series of root files, each one with several trees. I, therefore, have divided the analysis part into three main functions:
The first function is responsible for opening the consecutive root files one at a time and storing the pointers to a TFile* vector. Then it looks inside the opened file and recovers the required trees which are in their term assigned to a TTree* vector. Branches of each tree type are then accessed by SetBranchAddress, using a vector of TBranch* and appropriate pointer vectors eg std::vector<vector*>.
When each tree is opened, the pointers and branch vectors are resized and new pointers initialized to zero are added at the back of the vector.
Although the process should in effect work, when I try to access the assigned branch pointers after having called the TBranch::GetEntry(), the associated vector pointer remains zero. The system works for the first tree in the array of trees, but not for the rest…
Any ideas?

UPDATE: Sometimes it runs, sometimes it does not though. And before anyone blames Windows, the same behavior is observed when running in the standard LXPLUS setup… This looks like a memory leak…


ROOT Version: 6.28.02 (x64)
Platform: Windows 11
Compiler: MSVC 2022 (17.5.3)


Do you have some kind of reproducer?

On lxplus, run with valgrind:

valgrind --log-file=val.log --suppressions=$ROOTSYS/etc/valgrind-root.supp myexecutable myarguments`

and the file val.log might point to the issue.

Thanks for the suggestion, but Valgrind actually sees 0 bytes reserved in memory and asumes there is no leak (which does makes sens).
In the meantime, one of the times the code actually run I got the following crash message from root :

The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007f8fef12a134 in TStreamerInfoActions::VectorLooper::ConvertCollectionBasicType<long long, int>::Action(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /usr/lib64/root/libRIO.so
#6  0x00007f8feeffa495 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /usr/lib64/root/libRIO.so
#7  0x00007f8fe0341239 in TBranchElement::ReadLeavesMember(TBuffer&) () from /usr/lib64/root/libTree.so.6.24
#8  0x00007f8fe033c243 in TBranch::GetEntry(long long, int) () from /usr/lib64/root/libTree.so.6.24
#9  0x00007f8fe034b021 in TBranchElement::GetEntry(long long, int) () from /usr/lib64/root/libTree.so.6.24
#10 0x00007f8fdb11c842 in LGADBase::CombineTrackCory (this=this
entry=0x7f21b50, EvntNmbr=0, evntlast=std::vector of length 14, capacity 14 = {...}, file=0) at /afs/cern.ch/work/e/egkougko/public/LGADUtils/./Root/TrackCombine.cxx:734
#11 0x00007f8fdaf0f1a7 in LGADBase::WriteTestBeamBinary (this=this
entry=0x7f21b50, dir=<optimized out>, name=<optimized out>, ext=ext
entry=0x7f8fdaf122a0 "dat", evt1=0, evt2=500) at /afs/cern.ch/work/e/egkougko/public/LGADUtils/./Root/DataConverters.cxx:1788
#12 0x00007f8fdaf0fe00 in LGADBase::ConvertData (this=0x7f21b50) at /afs/cern.ch/work/e/egkougko/public/LGADUtils/./Root/DataConverters.cxx:93
#13 0x00007f8ff27e2067 in ?? ()
#14 0x00007ffd6785ad00 in ?? ()
#15 0x5d6a7fce5098a700 in ?? ()
#16 0x00007ffd6785ad90 in ?? ()
#17 0x00000000024d9cf0 in ?? ()
#18 0x00007f8fe9f910d0 in ?? () from /usr/lib64/root/libCling.so
#19 0x00007ffd6785b200 in ?? ()
#20 0x00007ffd6785b200 in ?? ()
#21 0x00000000024e9ed0 in ?? ()
#22 0x00000000083b25e8 in ?? ()
#23 0x00007f8fea00a1fe in cling::IncrementalExecutor::executeWrapper(llvm::StringRef, cling::Value*) const () from /usr/lib64/root/libCling.so
#24 0x00007f8fe9f928a0 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) () from /usr/lib64/root/libCling.so
#25 0x00007f8fe9f93890 in cling::Interpreter::EvaluateInternal(std::string const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) () from /usr/lib64/root/libCling.so
#26 0x00007f8fe9f93b33 in cling::Interpreter::process(std::string const&, cling::Value*, cling::Transaction**, bool) () from /usr/lib64/root/libCling.so
#27 0x00007f8fea066c47 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) () from /usr/lib64/root/libCling.so
#28 0x00007f8fe9ec484a in HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) () from /usr/lib64/root/libCling.so
===========================================================

Am I correct by reading these lines that there is a conversion issue from long long to int? And if yes, shouldn’t I get and “Incorrect Pointer Type” Messaga?

The problem is very unlikely to be a leak but rather a problem with using wrong memory locations (uninitialized memory, out of bound access, used after delete etc…)

it is usually not an issue. ROOT support ‘automatic schema evolution’ where you might have store a column in a type (eg int) but want to read it into another type (eg long long) (and in practice you may have some older files with the old type and many newer files with the new type). The conversion is handled by function like the one you see.

The stack trace confirm that it is a memory error of sorts. Valgrind should have reported about it. Can you share the file val.log you obtained?

Does not seem that Valgrind detected anything…
Here are the contents of the log file:

==16829== Memcheck, a memory error detector
==16829== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16829== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==16829== Command: root -l Test.C+
==16829== Parent PID: 23546
==16829== 
==16829== 
==16829== HEAP SUMMARY:
==16829==     in use at exit: 0 bytes in 0 blocks
==16829==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==16829== 
==16829== All heap blocks were freed -- no leaks are possible
==16829== 
==16829== For lists of detected and suppressed errors, rerun with: -s
==16829== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The command root is small wrapper than calls the real ROOT executable named 'root.exe` and valgrind does not follow through a fork. To actually analyze the issue use:

valgrind --suppressions=$ROOTSYS/etc/valgrind-root.supp root.exe -b -l -q Test.C+

val.txt (14.5 KB)

EDIT: Re-ran valoigrind adding the --track-origins argument, attahed the new log

Attached the log file from valgrind. I used a suppression file from root 6.28 while the version in lxplus is root 6.24. The ROOTSYS variable does not seem to be setup in the standard configuration without an LCG script load.

In the meantime, I also printed out the addresses of the vector pointers assigned to the branches of the trees to make sure they propagate correctly and get initialized to 0 when created. This does seem to work but I was wondering, if one does b_branch(of tree1)->GetEntry(entry_1) and then exactly after the same but for another tree, shouldn’t there be a gDiretcotry command to point to the second tree in the meantime?

The part part of the code that seems to fail, or at least not working as it should, id the TBranch->GetEntry(entry) function. It seems to not retrieve the data from the tree although the pointer is correctly set.

one does b_branch(of tree1)->GetEntry(entry_1) and then exactly after the same but for another tree,

No need. Once loaded the TTree knows where to get the data.

The revelant part of the valgrind output is:

==5118== Invalid read of size 8
==5118==    at 0x196252C5: TBranchElement::ReadLeavesMember(TBuffer&) (in /usr/lib64/root/libTree.so.6.24.08)
==5118==    by 0x19620242: TBranch::GetEntry(long long, int) (in /usr/lib64/root/libTree.so.6.24.08)
==5118==    by 0x1962F020: TBranchElement::GetEntry(long long, int) (in /usr/lib64/root/libTree.so.6.24.08)
==5118==    by 0x22E4EDE9: ??? (in /afs/cern.ch/work/e/egkougko/public/LGADUtils/Root/TrackCombine_cxx.so)
==5118==    by 0x5: ???
==5118==  Address 0x17311568 is 8 bytes inside a block of size 16 free'd
==5118==    at 0x4C2B51D: operator delete(void*) (vg_replace_malloc.c:586)
==5118==    by 0x22E5A756: ??? (in /afs/cern.ch/work/e/egkougko/public/LGADUtils/Root/TrackCombine_cxx.so)
==5118==    by 0x2160FDDF: ???
==5118==    by 0x212D744F: ???
==5118==    by 0x2130336F: ???
==5118==    by 0x21CF2B47: ???
==5118==    by 0x2161012F: ???
==5118==    by 0x22E556BA: ??? (in /afs/cern.ch/work/e/egkougko/public/LGADUtils/Root/TrackCombine_cxx.so)
==5118==  Block was alloc'd at
==5118==    at 0x4C2A593: operator new(unsigned long) (vg_replace_malloc.c:344)
==5118==    by 0x22E5A72F: ??? (in /afs/cern.ch/work/e/egkougko/public/LGADUtils/Root/TrackCombine_cxx.so)
==5118==    by 0x2160FDDF: ???
==5118==    by 0x1FBE9B5F: ???
==5118==    by 0x2130336F: ???
==5118==    by 0x1F303D57: ???
==5118==    by 0x2161012F: ???
==5118==    by 0x22E556BA: ??? (in /afs/cern.ch/work/e/egkougko/public/LGADUtils/Root/TrackCombine_cxx.so)

But the essential information is still obscured (what is being new, when is it being deleted) and would become clearer with a debug build of TrackCombine_cxx.so (if it is a user script use root.exe -b -l -q TrackCombine.cxx++kg

My guess (so far) would be that the object that is pointed to by the pointer given to the branch is deleted but the branch is not informed. For example:

datatype *pointer = nullptr;
tree->SetBranchAddress(branchname, &pointer);
tree->GetEntry( entry);
delete pointer; // should then also call pointer = nullptr;
tree->GetEntry( entry + 1); // This will crash

Thanks for the suggestion but I doubt this is the case here. The pointers are actually created sequentially and added at the back of a pointer vector following the process:

std:vector<std::vector<int>*> int_pointers;
if (tree)
{
int_pointers.resize(int_pointers.size()+1);
int_pointers.push_back(new std::vector<int>(int_pointers.size()-1));
int_pointers.at(int_pointers.size()-1) = NULL;
tree->SetBrachAddress(branchname, &int_pointers.at(int_pointers.size()-1), &branch_pointer);
}

This pointer vector is a public class object that is then passed along to the function that recovers the entry from the tree. The hex addresses of the pointers in this pointer vector are correctly assigned from the initialization and do not seem to change within the function that request the entry. The entry is requested as follows:
branch_pointer->GetEntry(entry)
but obviously when trying to access the int_pointers.at(int_pointers.size()-1) the exception is generated.
The hex address of the int_pointers.at(int_pointers.size()-1) is unaltered up to the GetEntry function and equal to the one assigned when calling the SetBaranchAddress.
I will try the kg option when compiling the Trackcombine macro.

You must not resize the “int_pointers” one by one. The addresses of pointers given to SetBrachAddress must remain valid all the time (i.e., the “int_pointers” must not reallocate elements).

1 Like
tree->SetBrachAddress(branchname, &int_pointers.at(int_pointers.size()-1), &branch_pointer);

This give the branch the address of one of the element inside the vector. If at any point after this call the vector needs to change its memory allocation (eg because of a resize), this will become invalid and need to be reset/redone.

The pointers are actually created sequentially

If I understand this correctly this means that the code does (in essence):

 std:vector<std::vector<int>*> int_pointers;
int_pointers.resize(int_pointers.size()+1);
...
tree->SetBrachAddress(branchname_1, &int_pointers.at(int_pointers.size()-1), &branch_pointer);
...
...
int_pointers.resize(int_pointers.size()+1); 
...
tree->SetBrachAddress(branchname_2, &int_pointers.at(int_pointers.size()-1), &branch_pointer);

Where the 2nd resize is fatal !!! The code should be more like:

std:vector<std::vector<int>*> int_pointers;
int_pointers.reserve(total_number_branches);  // Reserve allocates the memory but does not increase the size
...
int_pointers.push_back(...);  // push_back and resize does not lead to reallocation as long as the size is under the reserved size.
tree->SetBrachAddress(branchname_1, &int_pointers.at(int_pointers.size()-1), &branch_pointer);
...
...
int_pointers.push_back(...);
tree->SetBrachAddress(branchname_2, &int_pointers.at(int_pointers.size()-1), &branch_pointer);
...

Cheers,
Philippe.

PS. Independently of this I am confused by the pattern:

int_pointers.resize(int_pointers.size()+1);
int_pointers.push_back(new std::vector<int>(int_pointers.size()-1));
int_pointers.at(int_pointers.size()-1) = NULL;

Do you really mean to ‘add’ 2 entries at every single iteration? If so, why? And why does the size of the ‘other’ vector depends on the number of branches seen so far?

OK thanks, now I understand where the error originates from.
Just to put the code into context:
I need to open several root files each of one of which has several trees but NOT always the same. This is why I need to have a dynamically allocated number of branch pointers. I believed that the most efficient way would be to open the files one by one, test the existence of the trees and asign the pointers. Then move to the next file. To avoid resizing the pointers vector, I would need to add an initial “exploratory” loop where i figure out the size of pointers I need to reserve and a secondary loop where I assign the addresses to the branches.
As a question, I do not do resize but leave the vector as declared and continue doing push_back operations without reserving an initial amount of elements, would this work? Or since I did not do reserve, in every push_back it actually might move the container if it has no more memory at that location to expand it?

If you do not “reserve” the needed (total) number of elements at the beginning then “push_back” will perform reallocations.
If the “capacity” is sufficient then “push_back” will use the available space (i.e., insertions will not trigger reallocation).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.