Fastest way to loop over TTree's and entries

dfakoudi · May 5, 2023, 11:54am

Hello,
I tried to find this question elsewhere in the forum but found nothing!
So I am having a lot of files and I want to loop quickly over the entries after doing a small shift in pt of each particle.

If I do tree->Draw() the procedure is quite faster than doing a for loop and iterating through the events. So I guess there is a way faster way to do it than my newbie way of “for loops”. So any suggestion would be super helpful. I guess the answer will something about TSelector or MakeClass?

Thank you in advance

vpadulan · May 5, 2023, 11:59am

Hi @dfakoudi ,

You should try out RDataFrame!

Without any more context, I guess something like this should give you a good start for what you mentioned in your post:

df = ROOT.RDataFrame("treename", "filename")
df = df.Define("my_pt", "pt + shift")

h = df.Histo1D("my_pt")

h.Draw()

dfakoudi · May 5, 2023, 12:24pm

Thank you very much for your reply,
The thing is my code is in C++ and I have a vector of histograms that have been already defined and they need to be filled.
I am not sure how to fill them with RDataFrame.

Also I am not sure how to fill a histogram with a double and not a branch.

Wile_E_Coyote · May 5, 2023, 12:25pm

Attach your current source code for inspection.

BTW. In principle, any automatically generated “analysis skeleton” source code file contains some notes on how to make the “for” loop fast (often using the TTree::SetBranchStatus).

dfakoudi · May 5, 2023, 12:38pm

So basically I am giving a program an array of cuts in eta and pt and then it creates a vector of histograms of MC and Data to compare them. And then it fills them up according to the cuts I have specified.

So the main structure of the code is:

void initialize_histos(vector_hist_data, vector_hist_MC)
{
//Just initializing the histograms here of MC and Data, I am creating as much histograms as needed by my cuts basically!
}

load_histogram(TFile* file, vector<TH1F*>& vector_hist_data, vector<TH1F*>&  vector_hist_MC)

{
  //Here I have take the branches from the tree and so on.
  for(entries){
  pt = pt + shift;
  vector_hist_data.Fill(pt);
  }
//and the same for MC
}


int main(){
vector<double> eta_cuts;
vector<double> pt_cuts;

vector<TH1F*> vector_hist_data;
vector<TH1F*> vector_hist_MC;

initialize_histos(vector_hist_data, vector_hist_MC);
TFile *file=...
load_histogram(vector_hist_data, vector_hist_MC);
}

Wile_E_Coyote · May 5, 2023, 12:39pm

Show what you exactly do for the “pt” branch (I need to know how you “take the branches from the tree and so on”).

dfakoudi · May 5, 2023, 12:54pm

Thanks again, here it is

  float Pos_ID_Pt;
  float Neg_ID_Pt;
  float Pos_ID_Phi;
  float Neg_ID_Phi;
  float Pos_ID_Eta;
  float Neg_ID_Eta;
  float weight;


      for(int i=0; i<=tree->GetEntries();i++){
        tree->GetEntry(i);
        tree -> SetBranchAddress("Pos_ID_Pt",&Pos_ID_Pt);
        tree -> SetBranchAddress("Neg_ID_Pt",&Neg_ID_Pt);
        tree -> SetBranchAddress("Pos_ID_Phi",&Pos_ID_Phi);
        tree -> SetBranchAddress("Neg_ID_Phi",&Neg_ID_Phi);
        tree -> SetBranchAddress("Pos_ID_Eta",&Pos_ID_Eta);
        tree -> SetBranchAddress("Neg_ID_Eta",&Neg_ID_Eta);
        tree -> SetBranchAddress("TotalWeight",&weight);

        Pos_ID_Pt = Pos_ID_Pt + shift_correction; // The shift_correction I take from a function


        TLorentzVector muon_pos; muon_pos.SetPtEtaPhiM(Pos_ID_Pt, Pos_ID_Eta, Pos_ID_Phi, mass);
        TLorentzVector muon_neg; muon_neg.SetPtEtaPhiM(Neg_ID_Pt, Neg_ID_Eta, Neg_ID_Phi, mass);
        TLorentzVector Jpsi; Jpsi = muon_pos + muon_neg;
        
        for(int histogram = 0 ; histogram < vec_hist_data.size(); histogram++) //Looping over vector of histograms
             
             {//Defining here the cuts needed for the specific histogram
              vec_hist_data.at(histogram)->Fill(Jpsi.M(),weight);
             }


}

Wile_E_Coyote · May 5, 2023, 1:14pm

if ((!file) || file->IsZombie()) return; // just a precaution
//
TTree *tree;
file->GetObject("YourDataTreeName", tree);
if (!tree) return; // just a precaution
// tree->SetMakeClass(1); // may be needed
//
// For performance reasons, deactivate (disable) all branches ...
tree->SetBranchStatus("*", 0);
// ... and activate (enable) branches that are used in the "for" loop below
//
tree->SetBranchStatus("Pos_ID_Pt", 1);
tree->SetBranchStatus("Neg_ID_Pt", 1);
tree->SetBranchStatus("Pos_ID_Phi", 1);
tree->SetBranchStatus("Neg_ID_Phi", 1);
tree->SetBranchStatus("Pos_ID_Eta", 1);
tree->SetBranchStatus("Neg_ID_Eta", 1);
tree->SetBranchStatus("TotalWeight", 1);
//
float Pos_ID_Pt;
float Neg_ID_Pt;
float Pos_ID_Phi;
float Neg_ID_Phi;
float Pos_ID_Eta;
float Neg_ID_Eta;
float weight;
//
tree->SetBranchAddress("Pos_ID_Pt", &Pos_ID_Pt);
tree->SetBranchAddress("Neg_ID_Pt", &Neg_ID_Pt);
tree->SetBranchAddress("Pos_ID_Phi", &Pos_ID_Phi);
tree->SetBranchAddress("Neg_ID_Phi", &Neg_ID_Phi);
tree->SetBranchAddress("Pos_ID_Eta", &Pos_ID_Eta);
tree->SetBranchAddress("Neg_ID_Eta", &Neg_ID_Eta);
tree->SetBranchAddress("TotalWeight", &weight);
//
// The loop over all entries
for (Long64_t i = 0; i < tree->GetEntries(); i++) {
  tree->GetEntry(i); // read all enabled branches
  // ...
}
//
#if 1 /* 0 or 1 */
delete tree; // no longer needed
#else /* 0 or 1 */
tree->ResetBranchAddresses(); // disconnect from local variables
tree->SetBranchStatus("*", 1); // activate (enable) all branches again
#endif /* 0 or 1 */

BTW. I usually prefer to keep everything related to a branch in one line (note: first the “status” then the “address”), e.g.:

float Pos_ID_Pt; tree->SetBranchStatus("Pos_ID_Pt", 1); tree->SetBranchAddress("Pos_ID_Pt", &Pos_ID_Pt);
// ...
float weight; tree->SetBranchStatus("TotalWeight", 1); tree->SetBranchAddress("TotalWeight", &weight);

vpadulan · May 5, 2023, 1:54pm

Alternatively, with RDataFrame:


// supposing you have cuts in the form of strings of code
// Or you could store them as vectors of function pointers
// Or as vectors of double and then inject the values in the function passed to Filter
std::vector<std::string> pt_cuts{"pt > 25", "pt < 30",...};
std::vector<std::string> eta_cuts{"eta < 3", "abs(eta) < 5",...};

// Create an RDataFrame that will process your tree
ROOT::RDataFrame df{"treename","filename.root"};

// As I understand the shift applies to all entries
// So let's do it here at the beginning
auto shifted_pt_df = df.Define("new_pt", "pt + shift");

std::vector<ROOT::RDF::RResultPtr<TH1D>> histos;

// For all pt cuts, book all the histograms needed
for (const auto &cut : pt_cuts){
    for (const auto &colname: cols_of_my_histos){
        histos.append(shifted_pt_df.Filter(cut).Histo1D(colname));
    }
}

// For all eta cuts, book all the histograms needed
for (const auto &cut : eta_cuts){
    for (const auto &colname: cols_of_my_histos){
        histos.append(shifted_pt_df.Filter(cut).Histo1D(colname));
    }
}


// No event loop has run until this far. You have a vector of RResultPtr<TH1D>, which contain no actual histograms at this point.

// Now we trigger the execution of the whole event loop.
// For simplicity, I access the first RResultPtr from the vector and ask RDataFrame to compute its value.
// All histograms will be filled together in one go
histos[0].GetValue();


// Now we can draw them
histos[5].Draw();

// Or store them in a file
ROOT::TFile f{"myfile.root", "recreate"};
auto h = histos[12].GetValue(); // Let's retrieve the actual TH1D first
f.WriteObject(h, "myhistogram");
f.Close();

More info about the lazy execution employed by RDataFrame and RResultPtr at ROOT: ROOT::RDataFrame Class Reference

system · May 19, 2023, 1:55pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.