For what it’s worth, there seems to be also a dependence on how “similar” the branch names are. For example, if I try reading only two branches with names:
HLT_Mu7_IP4_part0
HLT_Mu7_IP4_part1
I get the crash. While trying with the following branches:
nElectron
Electron_pt
Electron_eta
Electron_deltaEtaSC
Electron_phi
Electron_charge
Electron_dxy
Electron_dz
(same set for Muon_* without the deltaEtaSC)
[
genWeight
LHEWeight_originalXWGTUP
]
The last two branches within [] are the ones I added to trigger the crash. For now I do not have any explanation for this strange behavior.
Keeping the read branch names “clearly different”, I can go up to reading 25 branches without crashes. However, I don’t know the criteria for “clearly different”, since there have been some names I thought are “clearly different” yet it still crashed…
In the code, I don’t perform any string manipulation except duplicate checking before storing them.
It looks like you are using a TChain. The TBranch object are deleted (and created slightly differently) at each file boundary. To capture a TBranch address you need to use:
This I didn’t know about. Does this mean I need to call SetBranchAddress at every file change or is this automatically updated?
Somewhat related, I perform GetEntry calls from the TBranch pointers (at which point the code no longer knows about the source TChain/Tree), is this in any way affected by the above?
And just to confirm, the above capture will also work with TTree, correct?
This I didn’t know about. Does this mean I need to call SetBranchAddress at every file change or is this automatically updated?
The pointer value is automatically updated.
And just to confirm, the above capture will also work with TTree , correct?
yes
Somewhat related, I perform GetEntry calls from the TBranch pointers (at which point the code no longer knows about the source TChain / Tree ), is this in any way affected by the above?
TBranch::GetEntry require the entry number within the current TTree. To work correctly you need to pass the return value of LoadTree.
for(Long64_t overallEntry = 0; overallEntry < chainOrTree->GetEntriesFast(); ++overallEntry) {
Long64_t localEntry = chainOrTree->LoadTree(overallEntry); // for a TTree this is a nop and returns overallEntry.
branch->GetEntry(localEntry);
Ok, let me see if I understood correctly: In both, tree can be either TTree or TChain
Current way:
branch = tree->GetBranch("name");
branch->SetStatus(1);
branch->SetAutoDelete(false);
branch->SetAddress(&obj);
[... much later ...]
branch->GetEntry(n);
is not correct in that it’ll give me problems when the file is changed. Instead I should do:
tree->SetBranchStatus("name", 1);
tree->SetBranchAddress("name", &obj, &branch);
branch->SetAutoDelete(false);
[... much later ...]
branch->GetEntry(n);
Is this correct?
edit: seeing @pcanal’s reply I see there’s also a problem with how I handle the n. This may be an issue, but ok fine I’ll just have the method see the source TChain too.
I found out that there is a related problem. Much later in the code, I read the branches starting from single branches, some of which are counters. Since I read array branches into vectors, I also check that the read counter does not exceed the vector capacity, and if it does, resize the vector as necessary. Of course this means I need to reset the address, since vector.data() has now changed.
However in this case I ran into the same issue in that the branch->SetAddress() is not fully compatible with TChain. I tried updating it to the tree->SetBranchAddress("name", &obj, &branch) syntax, where tree = counter_branch->GetTree() but this does not fix the issue, presumably because GetTree() gives the current tree rather than the original TChain.
Is this understanding correct, and is there a way for me to get the original TChain so I can update its address table, without explicitly introducing the TChain pointer in the block where GetEntry() and vector layout happen?
presumably because GetTree() gives the current tree rather than the original TChain.
Correct.
without explicitly introducing the TChain pointer in the block where GetEntry() and vector layout happen?
I am afraid not, however …
What kind of branches do you have (i.e. can you delegate the resizing to ROOT)?
For a branch that hold a variable you can call GetMaximum on the BranchCount branch (or LeafCount branch) to get the maximum size of the collection. So the test and resizing needs to be done only once per file.
You can either be notified (by attaching an object with a Notify function to the chain via SetNotify) when a file change or you can detect the switch over (check GetTreeNumber() value after a call to LoadTree).
They are floats, ints - plain old data in the informal sense - and arrays of those.
It’s possible to delegate the resizing, but this probably requires breaking the current structure of the code, which I would rather not do.
That said, if I can really avoid running the test every event, this may be a good optimization opportunity. However it’s not obvious how to do this yet, since it seems that I still need to check the result of GetTreeNumber() every event, which in the case of no resizing, is exactly the same cost of one if statement per event.
As for Notify(), I don’t have a good feel of how it works so I can’t say what the cost will be.
Hmm. Do I understand correctly that beyond the fact that the Notifier class needs to inherit from TObject, and that it needs to have the void Notify() signature member method, the layout is arbitrary and that together with the chain->SetNotifier(¬ifier) call, this allows me to execute arbitrary code when, and only when, each time the file in the chain changes?
Very interestingly, the Notify() approach is slower by far:
Processing 9945000 events...
^C
real 20m17.473s
user 19m54.261s
sys 0m18.491s
vs manually resizing in each event, when the counter exceeds the current capacity.
Processing 9945000 events...
Processed 9945000 events!
756406 pass trigger
160573 also contain emu pair
real 2m35.967s
user 2m33.073s
sys 0m0.930s
Perhaps due to the cost of TTree::GetMaximum()? Either way, premature optimization is probably not a good idea, so I’ll settle with the current performance and fill up the rest of the framework, now that the core issue of the thread is solved.
For a give branch that correspond to array, you can call GetBranchCount() and then get its leaf and call GetMaximum on the leaf.
For example, this might work for you:
TLeaf *leafcount = fChain->GetLeaf(name_of_index);
// and use leafcount->GetMaximum();