Use TTreeFormula to select elements of TTreeReaderArray

Hi,

I’m trying to analyze a root tree using a TTreeReader; the tree is designed to be read by using TTreeReaderArray.
In my case, it is defined as:
TTreeReaderArrayedm4hep::MCParticleData MCParticles(tree_reader,“MCParticles”);

In my analysis, I think I can use TTreeFormula to apply a filter to each element of the TTreeReaderArray. For example, the effect I’d like to obtain is analogous to
if(MCParticles[index].PDG == 211 && MCParticles[index].momentum.z > 5) → do something

Can a TTreeFormula be defined to do this? I’d like to read the selection conditions from an external file and have the flexibility to change them often.

Thanks


Please read tips for efficient and successful posting and posting code

_ROOT Version: 6.30/02
Platform: Not Provided
Compiler: Not Provided


Hi @Simone_Vallarino,

Thank you for your question! I am not sure what exactly you are trying to do but it sounds like you may want to try and use RDataFrame instead in which you can easily Define new variables and Filter through, display the contents as well as snapshot to new root files. For the very beginning you could take a look here (this tutorial is a Python Jupyter Notebook): student-course/course/notebooks/core/03-rdataframe-basics.ipynb at main · root-project/student-course · GitHub and continue via further notebooks in the same repository and also check out those tutorials later (both in Python and C++): ROOT: Dataframe tutorials and/or the full class description here: ROOT: ROOT::RDataFrame Class Reference.

I hope this makes your workflow a bit easier, even if requires a bit of reading up on new stuff.

Cheers,
Marta

Hi @mczurylo,

Thanks for your reply.
Yes, I can do it with the RDataFrame, but I’m still uncertain about how to apply the selection condition to the [i] element of the vector constituting my results.

Let me try to explain myself better. In my root tree, a branch is constituted by a vector of object edm4hep::MCParticleData. It includes all the information about the N particles simulated during the event. I want to fill histograms for each particle only if it survives some selection criteria. How should I define the TTreeFormula to evaluate the condition for each vector element?

Alternatively, how can I do it using RDataFrame?

Bests,
Simone

Hello, @Simone_Vallarino !
Please look into code below with example of RDataFrame use-case. I hope you would find it useful.

Edit: Firstly, I’ve made the code more appropriate to your usecase.
Secondly, it seems that Filter() method of RDataFrame operates on per-event basis. In your case you want to apply some criteria on sub-event basis. Suggested strategy then is to construct new auxiliary columns.

// test_rdataframe.cpp
// imitation of real data structure
namespace edm4hep {

struct MCParticleData
{
    double energy;
};

}
// function to construct mock-up data
auto getFilledDataFrame(int rows, int cols) {
    ROOT::RDataFrame d(rows);

    auto dFilled = d.Define(
        "nparticles",
        [cols]() -> ULong64_t
        {
            return gRandom->Poisson(cols);
        },
        {}
    )
    .Define(
        "particles",
        [](const ULong64_t nparticles) -> ROOT::RVec<edm4hep::MCParticleData>
        {
            ROOT::RVec<edm4hep::MCParticleData> particles(nparticles);
            for (auto &value : particles) {
                value.energy = gRandom->Uniform();
            }
            return particles;
        },
        {"nparticles"}
    );

    return dFilled;
}
// logic for selection criteria
bool applySelectionCriteria(const edm4hep::MCParticleData& particle) {
    return (particle.energy > 0.5);
}

void test_rdataframe() {
    int rows = 100; // number of entries
    int cols = 100;   // mean sample size
    auto dFilled = getFilledDataFrame(rows, cols);
    // if real data available
    // ROOT::RDataFrame dFilled("treeName", "file.root");

    auto dWithSurvived = dFilled.Define(
        "survivedParticles",
        [](const ROOT::RVec<edm4hep::MCParticleData>& particles) -> ROOT::RVec<edm4hep::MCParticleData>
        {
            return particles[ROOT::VecOps::Map(particles, applySelectionCriteria)];
        },
        {"particles"}
    );

    int nbins = 10;
    double h_Emin = 0.0;
    double h_Emax = 1.0;

    auto h = dWithSurvived.Define(
        "survivedParticlesEnergies",
        [](const ROOT::RVec<edm4hep::MCParticleData>& survivedParticles) -> ROOT::RVec<double>
        {
            return ROOT::VecOps::Map(survivedParticles, [](const edm4hep::MCParticleData& particle) { return particle.energy; });
        },
        {"survivedParticles"}
    )
    .Histo1D<ROOT::RVec<double>>(
        {"h", "title", nbins, h_Emin, h_Emax},
        "survivedParticlesEnergies"
    );

    h->DrawClone();
}

Regarding this part of your question, it seems that you would like to have something as convenient as TTree::Draw() call.

myTree->Draw("myVar", "MCParticles[].PDG == 211 && MCParticles[].momentum.z > 5")

How does string with selection criteria become a TTreeFormula? As far as I know, TTree passes arguments to underlying TTreePlayer, which then runs TTreePlayer::Process over all events in tree.
Furthermore, TTreePlayer internally uses an abstract class TSelector which carries all logic to process events, and specialized version TSelectorDraw is used by TTree::Draw(). TTreeFormula is compiled from the string inside TSelectorDraw::CompileVariables method.

So far I am afraid that there is no direct and simple way to use TTreeFormula in an analysis based on simple TTreeReader loop like

   while (myTTreeReader.Next()) {
      // do stuff
   }

It seems that in order to use TTreeFormula you have to develop your own class akin to TSelectorDraw for analysis. There are tutorials h1analysisTreeReader and run__h1analysis to get acquainted with how TSelector works.

However, I would advise to learn RDataFrame, which is more modern and simple way of doing things. On the other hand, it might indeed lack convenience of TTreeFormula and TTree::Draw(), but similar behavior can be achieved.
In the code I’ve provided in my previous message, the following piece which applies selection

    auto dWithSurvived = dFilled.Define(
        "survivedParticles",
        [](const ROOT::RVec<edm4hep::MCParticleData>& particles) -> ROOT::RVec<edm4hep::MCParticleData>
        {
            return particles[ROOT::VecOps::Map(particles, applySelectionCriteria)];
        },
        {"particles"}
    );

can use just-in-time compiled strings instead of lambda function with some preparatory work:

double getEnergy(const edm4hep::MCParticleData& particle) {
    return particle.energy;
}

void test_rdataframe() {
    // ...
    auto dWithSurvived = dFilled.Define(
        "survivedParticles",
        "particles[Map(particles, getEnergy) > 0.5]"
    );
    // ...
}

I must notice that in my snippet stored class is not split, therefore some way to access class members is needed. If it was stored in a TTree with members split in branches, I suppose that even more simple code will suffice:

    auto dWithSurvived = dFilled.Define(
        "survivedParticles.energy",
        "particles.energy[particles.energy > 0.5]"
    );

To sum up, to use TTreeFormula outside of TTree::Draw() you have to set up additional infrastructure to compile expressions, like TSelectorDraw does, which seems to be a cumbersome task.
On the other hand, RDataFrame is more convenient in general and allows JIT compiling of expressions out of the box, but may require setting up auxiliary columns or functions for class members access, if the class was written to TTree without splitting.