Loop with RDataFrame

Hi there.
I have a root file with a branch named ADC, which consists of std::vector<int>.
I have successfully written a code using RDataFrame that draws the histogram of the fist element of ADC(that is, ADC[0]) over all events.
But it failed when I tried to do the same thing to ADC[1], ADC[2] and so on with for-loop.
I would appreciate if you could help me solve the problem.
Cheers

my program

using ints = ROOT::RVec<int>;

void myMacro()
{
    auto fileName = "rootfile.root";
    auto treeName = "tree";
    TString pdfname = "histogram.pdf",
            pdf = "pdf";

    auto c = new TCanvas();
    c->SetLogy();
    c->Print(pdfname + "[", pdf);

    ROOT::RDataFrame df(treeName, fileName, {"ADC"});

    // this works fine :)
    {
        // Draw the histogram of first element of branch `ADC'
        auto first_elm = [](ints v) { return v.at(0); };
        auto histo = df.Define("ADC0", first_elm, {"ADC"})
                         .Histo1D({"histo", "title;ADC(ch);Counts", 2048, 0, 4096}, "ADC0");
        histo->Draw();
        c->Print(pdfname, pdf);
    }

    // this doesn't work fine :(
    /*
    for (int i = 0; i < 30; ++i)
    {
        // Draw the histogram of i-th element of branch `ADC'
        auto ith = [](int i, ints v) { return v.at(i); };
        auto histo = df.Define("ADCi", ith, {i, "ADC"})
                         .Histo1D({"histo", "title;ADC(ch);Counts", 2048, 0, 4096}, "ADCi");
        histo->Draw();
        c->Print(pdfname, pdf);
    }
    */

    c->Print(pdfname + "]", pdf);
}

error message

myMacro.cpp:41:46: error: non-constant-expression cannot be narrowed from type 'int' to 'std::__1::vector<std::__1::basic_string<char>,
      std::__1::allocator<std::__1::basic_string<char> > >::size_type' (aka 'unsigned long') in initializer list [-Wc++11-narrowing]
        auto histo = df.Define("ADCi", ith, {i, "ADC"})
                                             ^
myMacro.cpp:41:46: note: insert an explicit cast to silence this issue
        auto histo = df.Define("ADCi", ith, {i, "ADC"})
                                             ^
                                             static_cast<size_type>( )

environment

❯ uname -a
Darwin ##############.ac.jp 20.4.0 Darwin Kernel Version 20.4.0: Fri Mar  5 01:14:02 PST 2021; root:xnu-7195.101.1~3/RELEASE_ARM64_T8101 arm64
❯ root --version
ROOT Version: 6.22/08
Built for macosxarm64 on Mar 10 2021, 14:20:04
From tags/v6-22-08@v6-22-08

Not sure about this: df.Define("ADCi", ith, {i, "ADC"}) (Looking at ROOT::RDataFrame::Define()), but I might be wrong. @eguiraud knows, for sure

Hi @haltack,
the third argument of a Define is a list of column names as strings, you cannot pass arbitrary types in there. To pass non-column values into your expressions you can use C++11 lambda captures:

    for (int i = 0; i < 30; ++i)
    {
        // Draw the histogram of i-th element of branch `ADC'
        auto ith = [i](ints v) { return v.at(i); };
        auto histo = df.Define("ADCi", ith, {"ADC"})
                         .Histo1D({"histo", "title;ADC(ch);Counts", 2048, 0, 4096}, "ADCi");
        histo->Draw();
        c->Print(pdfname, pdf);
    }

Unrelated, but to avoid unnecessary copies ints should be RVec<int>& or const RVec<int>& (i.e. pass-by-reference rather than pass-by-copy).

Finally, if you call histo->Draw directly inside the loop, RDataFrame is forced to read all data and fill the histogram 30 times, once per call! This is a better setup (might have typos in it, but it should convey the idea):

    std::vector<ROOT::RDF::RResultPtr<TH1D>> histos;
    for (int i = 0; i < 30; ++i)
    {
        // Book the filling of the histogram of i-th element of branch `ADC'
        auto ith = [i](ints v) { return v.at(i); };
        auto histo = df.Define("ADCi", ith, {"ADC"})
                         .Histo1D({"histo", "title;ADC(ch);Counts", 2048, 0, 4096}, "ADCi");
        histos.push_back(histo);
    }

    for (auto &h : histos) {
        h->Draw();
        c->Print(pdfname, pdf);
   }

Here we book the filling of all histograms first, and then when we first access one of the results RDataFrame can run a single loop over the data that fills all booked histograms. See the RDF user guide for more details.

Cheers,
Enrico

P.S.
and as you are using macros rather than compiled C++ programs: make sure to run the macro as root mymacro.C+ or, in the prompt, load it as .L macro.C+ (with a +) to compile it with optimizations to get the best performance. Otherwise the ROOT interpreter runs everything without optimizations (we are looking into improving the situation).

1 Like

@bellenot @eguiraud

Thank you for your help!
@eguiraud 's code works completely fine when executed as root myMacro.cpp.

However, root myMacro.cpp+ doesn’t work fine. It displays quite large amount of stderr output (more than 1,000 lines!). I’m not sure, but the cause is at the step of linking libraries. I attach the stderr output just in case.

Cheers

log.txt (115.0 KB)

These are all warnings, no errors, and then the program actually runs.

These warnings are something that came up with the very latest Mac upgrade ~10 days ago and it’s something I’m working on fixing, actually – but as annoying as they are, you can ignore them.

As you see at the bottom you actually get a .pdf out.

Super annoying that these come up in user code that uses 6.22 by the way, argh. I’ll try to fix them soon and then we’ll have to publish a patch release that you can upgrade to :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.