Filtering out part of event in RDataFrame

Hello,

My problem is next, I have a TTree with next Branches : Mult, pID[Mult], pEnergy[Mult], pTime[Mult], nID[Mult], nEnergy[Mult], nTime[Mult]. I have defined TCutG EnergyCut and would like to keep i parts of arrays per event where EnergyCut->IsInside(pEnergy[i], nEnergy[i]) and save the result into new TTree.

I could achieve my goal just by manually looping through my input Tree and selecting only parts that satisfy the condition. But since I am new to ROOT, I would like to ask what is a good practice for such a procedure? After reading about RDataFrame I had an idea that I could possibly create a custom filter, following another topic , but I am not sure how to proceed or if it is possible at all :

 ROOT::RDataFrame df("PermutationTree", "generatedTree_with_permutations.root");

 using doubles = ROOT::VecOps::RVec<Double_t>;

 auto cutInside = [](const doubles& pside, const doubles& nside, int n)
 {
     for(auto i = 0; i < n; ++i)
        if EnergyCut->IsInside(pside[i], nside[i])
          return ????
 };
 df.Filter(cutInside, {"pEnergy", "nEnergy", "Mult"}).Snapshot("filteredTree", "output.root");

Since I will be using ROOT a lot in the future and also gonna work with large TTrees, I just want to see what is a good practice for complicated cuts and filters: doing manual selection during looping or using RDataFrame with Filters?

Thanks a lot.

Best regards,
Yuliia

Hi Yuliia,
RDataFrame is definitely a good choice for complicated cuts on large TTrees. I suggest you take a look at the “crash course” here.

About your problem: Filter selects rows/events of the dataset that satisfy a certain condition. In your case, if I understand correctly, for each row/event you want to Define a new array that contains only some elements of the original array:

auto pside_selection = [](const doubles& pside, const doubles& nside, int n)
{
     doubles selected_pside;
     for(auto i = 0; i < n; ++i)
        if EnergyCut->IsInside(pside[i], nside[i])
           selected_pside.push_back(pside);
     return selected_pside;
}

df.Define("selected_pside", pside_selection, {"pEnetry", "nEnergy", "Mult"}).Snapshot(...);

This might not be exactly what you need but I hope it gives you an idea of what you can do.
Cheers,
Enrico

@eguiraud Thanks a lot for your reply, that’s exactly what I needed!

I expanded your code to fully cover my needs (might be useful for someone):

  using doubles = ROOT::VecOps::RVec<Double_t>;
  auto selection = [&](const doubles& data, const doubles& cond1, const doubles& cond2, const Int_t n)
  {
          std::vector<double> selectedPart;
          for(Int_t i = 0; i < n; ++i)
          {
            if ( energyCut->IsInside(cond1[i], cond2[i]) ) // check if within the Cut
              selectedPart.push_back(data[i]);
          }
          return selectedPart;
  };

  auto selectedData = df.Define("selected_pEnergy", selection, {"pEnergy", "pEnergy", "nEnergy", "Mult"})
                        .Define("selected_nEnergy", selection, {"nEnergy", "pEnergy", "nEnergy", "Mult"})
                        .Define("selected_pTime", selection, {"pTime", "pEnergy", "nEnergy", "Mult"})
                        .Define("selected_nTime", selection, {"nTime", "pEnergy", "nEnergy", "Mult"});

  // Save columns with selected data into TTree
  selectedData.Snapshot("myNewTree", "newfile.root", {"selected_pEnergy", "selected_nEnergy", "selected_pTime", "selected_nTime"});

Now I can think on improving my selection function, but I totally got the idea on how to work with RDataFrame, so thanks a lot!

Best regards,
Yuliia

1 Like