TTree benchmark

I’m trying to benchmark reading ROOT file and filling histograms from selected branches, my benchmarks are not making sense, and I wanted some expert opinion on how to it right.

I’m creating a ROOT Tree with 24 columns (named c1,…,c24), I’m reading 4 columns (c1,c2,c3,c4) and filling corresponding histograms. The TTree is filled with random numbers (0.0 to 1.0) with 50M rows.

When using simple iterative loop through TTree entries, I’m able to fill the histograms in 5.37 second, when using DataFrame approach, I get 7.2 seconds. Codes provided below. Is it expected for DataFrames to be slower?
I also tried using RNtuple, however, filling 4 histograms from RNtuple took 21 seconds.
I would appriciate some help in pointing to my mistake in the codes.

using ROOT/6.32

Code for reading TTree:

void benchmark_read(const char *root_file, const char *tree_name , int nread){
 TFile* file = TFile::Open(root_file);
    if (!file || file->IsZombie()) {
        std::cerr << "Error: Could not open file " << root_file << std::endl;
        return;
    }
    // Get the TTree
    TTree* tree = (TTree*)file->Get(tree_name);
    if (!tree) {
        std::cerr << "Error: TTree " << tree_name << " not found in file " << root_file << std::endl;
        return;
    }

    TH1F* hist1 = new TH1F("hist1", "Histogram for Branch 1", 100, 0, 1);
    TH1F* hist2 = new TH1F("hist2", "Histogram for Branch 2", 100, 0, 1);
    TH1F* hist3 = new TH1F("hist3", "Histogram for Branch 3", 100, 0, 1);
    TH1F* hist4 = new TH1F("hist4", "Histogram for Branch 4", 100, 0, 1);

    tree->SetBranchStatus("*",0);
    char bname[128];
    float branch[128];

    for(int i = 0; i < nread; i++){
        std::snprintf(bname,128,"c%d",i+1);
        tree->SetBranchStatus(bname,1);
        tree->SetBranchAddress(bname,&branch[i]);
    }
    auto start_time = std::chrono::high_resolution_clock::now();
    Long64_t nentries = tree->GetEntries();
    for (Long64_t i = 0; i < nentries; ++i) {
        tree->GetEntry(i);
         hist1->Fill(branch[k]);
        hist2->Fill(branch[1]);
        hist3->Fill(branch[2]);
        hist4->Fill(branch[3]);
    }
    auto end_time = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);

    std::cout << "Execution time: " << duration.count() << " ms" << std::endl;
}

Code for plotting unsing DataFrames:

void benchmark(){
    std::string file_name = "../benchmarks/output.root"; // Replace with your ROOT file name
    std::string tree_name = "Tree";     // Replace with your TTree name

    // Initialize the RDataFrame
    try {
        ROOT::RDataFrame df(tree_name, file_name);

        // Define the variables for which you want to create histograms
        std::vector<std::string> variables_to_plot = {"c1", "c2", "c3", "c4"}; // Replace with actual variable names

        // Create histograms for each variable
        std::vector<ROOT::RDF::RResultPtr<TH1D>> histograms;
        for (const auto &var : variables_to_plot) {
            histograms.push_back(df.Histo1D(var));
        }

        // Create a canvas to draw the histograms
        TCanvas canvas("canvas", "Histograms", 800, 600);
        canvas.Divide(2, 2); // Divide the canvas into a 2x2 grid

        // Draw histograms on the canvas
        auto start_time = std::chrono::high_resolution_clock::now();
        for (size_t i = 0; i < histograms.size(); ++i) {
            canvas.cd(i + 1); // Move to the appropriate pad
            histograms[i]->Draw();
            //histograms[i]->Fill();
        }
        auto end_time = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);
        
        std::cout << "Histograms saved as histograms.png" << std::endl;
        std::cout << "Execution time: " << duration.count() << " ms" << std::endl;
        
        // Save the canvas to a file
        canvas.SaveAs("histograms.pdf");
    } catch (const std::exception &e) {
        std::cerr << "Error initializing RDataFrame or processing histograms: " << e.what() << std::endl;
    }

    //return 0;
}

Code for plotting from RNtuple:

       // Path to the RNtuple ROOT file and the RNtuple name
    const std::string fileName = "output.root";
    const std::string ntupleName = "TTree";

    // Open the RNtuple file for reading
    auto ntuple = ROOT::Experimental::RNTupleReader::Open(ntupleName, fileName);
    //auto ntuple = RNTupleReader::Open(ntupleName, fileName);

    if (!ntuple) {
        std::cerr << "Error: Unable to open RNtuple file or RNtuple not found!" << std::endl;
        //return 1;
    }

    std::cout << "Successfully opened RNtuple: " << ntupleName << " from file: " << fileName << std::endl;

    // Column to plot
    const std::string columnToPlot = "c1";

    // Check if the column exists
    //if (!ntuple->GetDescriptor().HasField(columnToPlot)) {
    //    std::cerr << "Error: Column '" << columnToPlot << "' not found in the RNtuple!" << std::endl;
        //return 1;
    //}

    // Create a histogram for the column
    int nBins = 100; // Number of bins
    double minValue = 0.0; // Minimum value for the histogram
    double maxValue = 1.0; // Maximum value for the histogram
    TH1F hc1("hist_1", "histo c1", nBins, minValue, maxValue);
    TH1F hc2("hist_2", "histo c1", nBins, minValue, maxValue);
    TH1F hc3("hist_3", "histo c1", nBins, minValue, maxValue);
    TH1F hc4("hist_4", "histo c1", nBins, minValue, maxValue);
    // Fill the histogram with data from the RNtuple column
    auto fc1 = ntuple->GetView<float>("c1");
    auto fc2 = ntuple->GetView<float>("c2");
    auto fc3 = ntuple->GetView<float>("c3");
    auto fc4 = ntuple->GetView<float>("c4");
    auto start_time = std::chrono::high_resolution_clock::now();
    for (auto entryId : ntuple->GetEntryRange()) {
        ntuple->LoadEntry(entryId);
        hc1.Fill(fc1(false));
        hc2.Fill(fc2(false));
        hc3.Fill(fc3(false));
        hc4.Fill(fc4(false));
    }
 
    auto end_time = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);

Hi,

Thanks for the interesting post.
Could you also share how you compiled the examples, and, if possible, the input file (or the program to create it)?

For what concerns the comparison between TTree and RDF, we should consider at least two aspects:

  1. There can be some overhead in presence of very little workload, e.g. filling 4 histograms.
  2. Not only the event loop, but also plotting and some jitting is being benchmarked. We are discussing how to further reduce jitting (e.g. with type erased RDF nodes for actions): for the moment I can suggest replacing df.Histo1D(var) with df.Histo1D<float>(var) to minimise the jitting time.

Could you let us know what happens to the timing?

For what concerns RNTuple, the measurement is really welcome! We are still in the process of optimising the new format and related code. So far, the focus has been on large workflows coming from central processing of experiments or sophisticated analyses: simple cases did not have a lot of priority, but this does not mean there is nothing to do about them!
First thing: a lot happened between 6.32 and 6.34, even if only 6 months separated the two releases. It would be interesting to see 6.34 measurements too. In the meanwhile, to gain a more technical perspective, I put @jblomer and @silverweed in the loop.

Thanks again for sharing these numbers!

Cheers,
D

Hi,

Thank you for the reply, suprisingly the df.Histo1D<float>() makes it worse (significantly).

here are the benchmarks, using 5 consequitive runs:

df.Histo1D<float>():

Execution time: 9975 ms
Execution time: 9925 ms
Execution time: 9879 ms
Execution time: 9923 ms
Execution time: 9906 ms
--------------------------------

df.Histo1D():

Execution time: 7689 ms
Execution time: 7289 ms
Execution time: 7146 ms
Execution time: 7108 ms
Execution time: 7109 ms
---------------------------------

Here is the code that created the TTree file from a CSV file:

void csv_to_ttree(const char* csv_file, const char* tree_name = "Tree") {
    // Open the CSV file
    std::ifstream infile(csv_file);
    if (!infile.is_open()) {
        std::cerr << "Error: Could not open file " << csv_file << std::endl;
        return;
    }

    // Read the header line
    std::string line;
    std::getline(infile, line);
    std::istringstream header_stream(line);
    std::vector<std::string> column_names;
    std::string column;

    while (std::getline(header_stream, column, ',')) {
        column_names.push_back(column);
    }

    const int num_columns = column_names.size();
    int compressionLevel = 4; // Compression level (1 to 9, 4 is default for LZ4)
    int compressionAlgorithm = ROOT::kLZ4;
    
    // Create the ROOT file and TTree
    TFile* root_file = new TFile("output.root", "RECREATE");
    TTree* tree = new TTree(tree_name, "Tree created from CSV");
    root_file->SetCompressionAlgorithm(compressionAlgorithm);
    root_file->SetCompressionLevel(compressionLevel);
    // Create arrays to hold data for each column
    std::vector<float> branch_data(num_columns, 0.0);
    std::vector<float*> branch_pointers;

    for (int i = 0; i < num_columns; ++i) {
        branch_pointers.push_back(&branch_data[i]);
        tree->Branch(column_names[i].c_str(), branch_pointers[i], (column_names[i] + "/F").c_str());
    }
    std::chrono::duration<double> total_time(0);
    // Read the data lines and fill the TTree
    int counter = 0;
    while (std::getline(infile, line)) {
        std::istringstream line_stream(line);
        std::string value;
        int column_index = 0;

        while (std::getline(line_stream, value, ',')) {
            branch_data[column_index] = std::stof(value);
            ++column_index;
        }
        auto start_time = std::chrono::high_resolution_clock::now();
        tree->Fill();
        auto   end_time = std::chrono::high_resolution_clock::now();
        total_time += (end_time - start_time);
        counter++;
        if(counter%100000==0) printf(" processed %d\n",counter);
    }


    // Write the TTree to the file

    tree->Write();

    root_file->Close();

    std::cout << "TTree created and saved to output.root" << std::endl;
    std::cout << "Total time spent in subroutine: " << total_time.count() << " seconds" << std::endl;

}

I’m also not convinced that I’m creating the RNutple file correctly, just incase if it helps, I’m including the RNtuple creation code below. I will also try to benchmark this with 6.34.

void script(){
  const std::string fileName = "example_rntuple.root";
  const std::string ntupleName = "MyNtuple";
  
  // Create the output file and RNtuple writer
  auto model = ROOT::Experimental::RNTupleModel::Create();

  auto c1 = model->MakeField<float>("c1");
  auto c2 = model->MakeField<float>("c2");
  auto c3 = model->MakeField<float>("c3");
  auto c4 = model->MakeField<float>("c4");
  auto c5 = model->MakeField<float>("c5");
  auto c6 = model->MakeField<float>("c6");
  auto c7 = model->MakeField<float>("c7");
  auto c8 = model->MakeField<float>("c8");
  auto c9 = model->MakeField<float>("c9");
  auto c10 = model->MakeField<float>("c10");
  auto c11 = model->MakeField<float>("c11");
  auto c12 = model->MakeField<float>("c12");
  auto c13 = model->MakeField<float>("c13");
  auto c14 = model->MakeField<float>("c14");
  auto c15 = model->MakeField<float>("c15");
  auto c16 = model->MakeField<float>("c16");
  auto c17 = model->MakeField<float>("c17");
  auto c18 = model->MakeField<float>("c18");
  auto c19 = model->MakeField<float>("c19");
  auto c20 = model->MakeField<float>("c20");
  auto c21 = model->MakeField<float>("c21");
  auto c22 = model->MakeField<float>("c22");
  auto c23 = model->MakeField<float>("c23");
  auto c24 = model->MakeField<float>("c24");
  //auto writer = ROOT::Experimental::RNTupleWriter::Recreate(ntupleName, fileName, std::move(model));
  auto writer = ROOT::Experimental::RNTupleWriter::Recreate(std::move(model), "TTree", "output.root");

  TRandom3 rand;

  for(int row = 0; row < 50000000; row++){
    *c1 = rand.Uniform(0.0,1.0);
    *c2 = rand.Uniform(0.0,1.0);
    *c3 = rand.Uniform(0.0,1.0);
    *c4 = rand.Uniform(0.0,1.0);
    *c5 = rand.Uniform(0.0,1.0);
    *c6 = rand.Uniform(0.0,1.0);
    *c7 = rand.Uniform(0.0,1.0);
    *c8 = rand.Uniform(0.0,1.0);
    *c9 = rand.Uniform(0.0,1.0);
    *c10 = rand.Uniform(0.0,1.0);
    *c11 = rand.Uniform(0.0,1.0);
    *c12 = rand.Uniform(0.0,1.0);
    *c13 = rand.Uniform(0.0,1.0);
    *c14 = rand.Uniform(0.0,1.0);
    *c15 = rand.Uniform(0.0,1.0);
    *c16 = rand.Uniform(0.0,1.0);
    *c17 = rand.Uniform(0.0,1.0);
    *c18 = rand.Uniform(0.0,1.0);
    *c19 = rand.Uniform(0.0,1.0);
    *c20 = rand.Uniform(0.0,1.0);
    *c21 = rand.Uniform(0.0,1.0);
    *c22 = rand.Uniform(0.0,1.0);
    *c23 = rand.Uniform(0.0,1.0);
    *c24 = rand.Uniform(0.0,1.0);
    writer->Fill();
  }

  //writer->Close();
}

On RNTuple, the issue in the read code is the line ntuple->LoadEntry(entryId);. This will load the entire row (all columns). When using views, you won’t need to (should not) call LoadEntry().

Hello @gavalian,
aside from what @jblomer said about LoadEntry, I’m not sure why you’re calling the RNTupleViews with a false argument: you probably want to pass them entryId to get the id-th value out of them (false will be automatically converted to 0, so you’re always filling your histograms with the first entry).

Hi,

Thanks. Could you put the rootfile (or the CSV, which will be larger) somewhere?
Could you please share the way in which you compiled the programs you are running? Are they macros?

Cheers,
D

My appologies,
I just started using RNtuple, and apparently copied a code from a bad example. after fixing the code the execution time dropped to more reasonable 3.7 second.

However I’m still puzzled by the TTree DataFrame results vs pure iteration through TTree.

The fixed code below:

  auto ch1 = ntuple->GetView<float>("c1");
  auto ch2 = ntuple->GetView<float>("c2");
  auto ch3 = ntuple->GetView<float>("c3");
  auto ch4 = ntuple->GetView<float>("c4");
  auto start_time = std::chrono::high_resolution_clock::now();
  for (std::size_t i = 0; i < ntuple->GetNEntries(); ++i) {
    h1->Fill(ch1(i));
    h2->Fill(ch2(i));
    h3->Fill(ch3(i));
    h4->Fill(ch4(i));
  }

I was running scripts for this benchmark. Once I compiled, everything started to make sense.
Now I get, 7.09 sec for data frame, and 5.11 sec for df.Histo1D<float>(var)
and using RNtuple, in the compiled code I get 1.85 sec for the same task.

This is indeed a significant improvement of reader. Thank you for your help @jblomer, @Danilo and @silverweed.

2 Likes

Curious thing, I decided to try the same excersize with RNTuple with version 6.34, as it was mentioned that there are improvements in RNtuple. However, I found that 6.34.02 performs the smae task slower than 6.32.08. If this is of any interest someone can try it:

here are the times for reading 4 branches and filling histograms with compiled code and CINT
6.32 - 1.87 sec , 3.7 sec
6.34 - 2.62 sec, 5.28 sec

Also, the 6.34 was unable to open the RNtuple generated with 6.32 code base, the tree had to be regenerated with 6.34, the error message was:

libc++abi: terminating due to uncaught exception of type ROOT::Experimental::RException: no RNTuple named 'TTree' in file 'output.root' (unchecked RResult access!)

Also, the 6.34 was unable to open the RNtuple generated with 6.32 code base, the tree had to be regenerated with 6.34, the error message was:

This is intentional. The version in v6.32 was an experimental version that is not compatible with the released version in v6.34.

The slow down is not expected though and we need to investigate further.

Just to confirm: did you compile the program that you executed? With what flags?

Best,
D

The first column in the benchmarks is with compiled code, and it was compiled with simple command line (same in both cases):

g++ -O2 -o rntuple rntuple.cc `root-config --cflags --libs` -lROOTNtuple

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.