Data Corruption when Cutting a TTree

I am working on a project where I need to read data from an input tree and then make a decision on whether or not to copy each event over to an output tree so I wrote a short program in C++ to tackle this issue. This has caused problems when I am reading and copying data from branches that contain array values in each event. The size of the array varies between events, but there is an additional branch detailing how big this array is for each event. So I am using vectors to store the data in each event of the array branch temporarily and choosing the size of the vector after I read the branch that gives me the size of the array. I will show a quick example of what I have been doing so far:

int main(int argc, char** argv)
{
     TFile* ifile = new TFile(argv[1], "READ", "InputFile");
     TTree* itree = (TTree*) ifile->Get("ExampleTree");
     TFile* ofile = new TFile(argv[2], "RECREATE", "OutputFile");
     TTree* otree = itree->CloneTree(0);

     int i;
     int j;
     Int_t ArraySize;

     itree->SetBranchAddress("ArraySize", &ArraySize);

     for(i = 0; i < itree->GetEntries(); i++)
     {
          itree->GetEntry(i);

          vector <Double_t> Array;
          Array.resize(ArraySize);
          itree->SetBranchAddress("Array", Array.data());
          itree->GetEntry(i);
          itree->SetBranchAddress("Array", nullptr);

          Double_t CutCondition = 0;

          for(j = 0; j < Array.size(); j++)
          {
               CutCondition += Array[j];
          }

               if(CutCondition > 5)
                    otree->Fill();

     }

     ofile->Write();
     ofile->Close();
     ifile->Close();

     return 0;
}

However, when I look at the branch for “Array” in the output file, the values of the branch will be nonsense. What am I doing wrong and is there a better way to do what I am trying to accomplish? Thanks for the advice in advance.


Please read tips for efficient and successful posting and posting code

_ROOT Version: 5.34/38
_Platform: _ Not Provided
Compiler: Not Provided


Hi,
if you can update your ROOT version, there is indeed a better way, literally in one line of code:

ROOT::RDataFrame("ExampleTree", "InputFile").Filter("Sum(Array) > 5").Snapshot("ExampleTree", "OutputFile");

More info about RDataFrame here.

As per what goes wrong: all other branches are fine, just the “Array” branch contains bogus values in the output file? If yes, my bet would be that SetBranchAddress(..., nullptr) interferes with the CloneTree. Maybe you can call otree->SetBranchAddress to force the right address for the input branch (not sure, @pcanal can comment with more authority).

Cheers,
Enrico

Yes the SetBranchAddress to nullptr is disastrous as the address is propagated also to the output tree … thus it has nothing valid to write down. Just removing that SetBranchAddress should solve the problem.

I would consider using the following ‘improvements’ on the code

int i;
     int j;
     Int_t ArraySize;
     TBranch *brArraySize = nullptr;

     itree->SetBranchAddress("ArraySize", &ArraySize, &brArraySize);

     vector <Double_t> Array;
     for(i = 0; i < itree->GetEntries(); i++)
     {
          long long localentry = itree->LoadTree(i);
          brArraySize->GetEntry(localentry);
       
         if (Array.size() < ArraySize) {
             Array.resize(ArraySize);
              itree->SetBranchAddress("Array", Array.data());
         }
         itree->GetEntry(i);

          Double_t CutCondition = 0;

          for(j = 0; j < ArraySize; j++)
          {
               CutCondition += Array[j];
          }

               if(CutCondition > 5)
                    otree->Fill();

     }
     itree->ResetBranchAddreses();

Thank you for the response, I tried this and it seems to solve my issue. However, I want to understand how this solution works. I am a bit unclear on a few things:

  1. What is the purpose of including a TBranch in the code and using itree->LoadTree(i) rather than just using the integer ArraySize and calling itree->GetEntry(i)?

  2. Since we don’t set the address of “Array” to a nullptr and after the loop the branch address is still set to Array.data(), how are we making sure that a seg fault doesn’t occur from trying to get the new value of ArraySize but in the process of doing so dumping the data from the branch ArraySize into the vector ArraySize and having the vector be smaller than it’s supposed to since we haven’t resize it yet.

I think I have an idea of what is going on in the code, but want to make sure I fully understand what you did. Thank you for your help.

The LoadTree is not strictly necessary. It makes the code able to support TChain in addition to TTree.

Note that TTree::GetEntries returns a long long (aka Long64_t) and thus i should be declared the same.

I used brArraySize->GetEntry rather than itree->GetEntry to avoid reading the whole entry twice.

how are we making sure that a seg fault doesn’t occur from trying to get the new value of ArraySize

This is another property of calling “just” brArraySize->GetEntry in that the array itself is not read until we got a chance to resize it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.