Filling a RDataFrame using a std::vector GetRandom() method

Dear ROOT experts,

I have a class TRestComponent that contains a method named std::vector<Double_t> GetRandom() and a member named std::vector <std::string> fVariables. Both vectors fVariables and the vector returned by GetRandom will have the same dimension.

I want to implement a method that populates a RDataFrame using the results of GetRandom where each component is associated to a column name with names given by fVariables.

After some discussions with chatGPT, she agreed that the following should do the job:

ROOT::RDF::RNode TRestComponent::GetMonteCarloDataFrame(Int_t N) {
    // Create a RDataFrame with the specified columns
    ROOT::RDataFrame df(N);

    // Function to fill the RDataFrame using GetRandom method
    auto fillDataFrame = [this,&df](unsigned int idx) {
        auto randomValues = GetRandom();

        // Check if the size of randomValues matches the size of columnNames
        if (randomValues.size() != fVariables.size()) {
            throw std::runtime_error("Mismatch in sizes of fVariables and randomValues");
        }

        // Fill the RDataFrame with values from GetRandom
        for (size_t i = 0; i < randomValues.size(); ++i) {
            df.Define(fVariables[i], randomValues[i]);
        }
    };

    // Apply the fillDataFrame function to each entry in the RDataFrame
    df.Foreach(fillDataFrame, {"idx"});

    // Return the RNode (RDataFrame)
    return df;
}

However, I get the following compilation error:

[  1%] Building CXX object source/framework/CMakeFiles/RestFramework.dir/sensitivity/src/TRestComponent.cxx.o
In file included from /nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDataFrame.hxx:20,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/core/inc/TRestDataSet.h:28,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/inc/TRestComponent.h:29,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/src/TRestComponent.cxx:41:
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx: In instantiation of ‘ROOT::RDF::RInterface<T, V> ROOT::RDF::RInterface<T, V>::Define(std::string_view, F, const ColumnNames_t&) [with F = double; typename std::enable_if<(! std::is_convertible<F, std::basic_string<char> >::value), int>::type <anonymous> = 0; Proxied = ROOT::Detail::RDF::RLoopManager; DataSource = void; std::string_view = std::basic_string_view<char>; ROOT::RDF::ColumnNames_t = std::vector<std::basic_string<char> >]’:
/afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/src/TRestComponent.cxx:307:44:   required from here
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:394:111: error: no matching function for call to ‘ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager>::DefineImpl<double, ROOT::Detail::RDF::CustomColExtraArgs::None>(std::string_view&, std::remove_reference<double&>::type, const ColumnNames_t&, const char [7])’
  394 |       return DefineImpl<F, RDFDetail::CustomColExtraArgs::None>(name, std::move(expression), columns, "Define");
      |                                                                                                               ^
In file included from /nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDataFrame.hxx:20,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/core/inc/TRestDataSet.h:28,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/inc/TRestComponent.h:29,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/src/TRestComponent.cxx:41:
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:3173:4: note: candidate: ‘template<class F, class DefineType, class RetType> std::enable_if_t<std::is_default_constructible<RetType>::value, ROOT::RDF::RInterface<T, V> > ROOT::RDF::RInterface<T, V>::DefineImpl(std::string_view, F&&, const ColumnNames_t&, const string&) [with F = F; DefineType = DefineType; RetType = RetType; Proxied = ROOT::Detail::RDF::RLoopManager; DataSource = void]’
 3173 |    DefineImpl(std::string_view name, F &&expression, const ColumnNames_t &columns, const std::string &where)
      |    ^~~~~~~~~~
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:3173:4: note:   template argument deduction/substitution failed:
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:3171:47: error: no type named ‘ret_type’ in ‘struct ROOT::Detail::CallableTraitsImpl<double, false>’
 3171 |    template <typename F, typename DefineType, typename RetType = typename TTraits::CallableTraits<F>::ret_type>
      |                                               ^~~~~~~~
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:3224:4: note: candidate: ‘template<class F, class DefineType, class RetType, bool IsFStringConv, bool IsRetTypeDefConstr> std::enable_if_t<((! IsFStringConv) && (! IsRetTypeDefConstr)), ROOT::RDF::RInterface<T, V> > ROOT::RDF::RInterface<T, V>::DefineImpl(std::string_view, F, const ColumnNames_t&) [with F = F; DefineType = DefineType; RetType = RetType; bool IsFStringConv = IsFStringConv; bool IsRetTypeDefConstr = IsRetTypeDefConstr; Proxied = ROOT::Detail::RDF::RLoopManager; DataSource = void]’
 3224 |    DefineImpl(std::string_view, F, const ColumnNames_t &)
      |    ^~~~~~~~~~
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:3224:4: note:   template argument deduction/substitution failed:
In file included from /nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDataFrame.hxx:20,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/core/inc/TRestDataSet.h:28,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/inc/TRestComponent.h:29,
                 from /afs/desy.de/user/j/jgalan/rest-framework/source/framework/sensitivity/src/TRestComponent.cxx:41:
/nfs/dust/iaxo/group/software/root/6.26.02/install/include/ROOT/RDF/RInterface.hxx:394:111: note:   candidate expects 3 arguments, 4 provided
  394 |       return DefineImpl<F, RDFDetail::CustomColExtraArgs::None>(name, std::move(expression), columns, "Define");
      |                                                                                                               ^
make[2]: *** [source/framework/CMakeFiles/RestFramework.dir/sensitivity/src/TRestComponent.cxx.o] Error 1
make[1]: *** [source/framework/CMakeFiles/RestFramework.dir/all] Error 2
make: *** [all] Error 2

Press ENTER or type command to continue

Any guesses on how to solve this issue?
Thank you!

Hello,

Focussing only on the problematic line for now. The second argument of the Define method is a callable, and not a value. Therefore:

df.Define(fVariables[i], [&randomValues, &i] (){ return randomValues[i];});

Cheers,
D

Ok, thanks! Actually that solved the compilation issue. However, the code is not doing what I would expect it to do.

When I run it, the number of columns that the returned df contains is zero. Inside the class data member std::vector <std::string> fVariables there are actually 3 elements.

In a nutshell I need that this code adds X entries to the DF with the N-components of GetRandom associated to N-column names given by fVariables.

Thanks for the help!

Latest code version is this one

ROOT::RDF::RNode TRestComponent::GetMonteCarloDataFrame(Int_t N) {
    // Create a RDataFrame with the specified columns
    ROOT::RDataFrame df(N);

    // Function to fill the RDataFrame using GetRandom method
    auto fillDataFrame = [this,&df]() {
        auto randomValues = GetRandom();

        // Check if the size of randomValues matches the size of columnNames
        if (randomValues.size() != fVariables.size()) {
            throw std::runtime_error("Mismatch in sizes of fVariables and randomValues");
        }

        // Fill the RDataFrame with values from GetRandom
        for (size_t i = 0; i < randomValues.size(); ++i) {
            df.Define(fVariables[i], [&randomValues, &i] (){ return randomValues[i];});
        }
    };

    // Apply the fillDataFrame function to each entry in the RDataFrame
    df.Foreach(fillDataFrame);

    return df;

Ok, I think it is clear that the previous code is wrong. I have seen examples on defining columns using lambda functions. But the difference with my implementation resides on the fact that I need to define N-columns where the values for each entry are correlated through a common call to GetRandom that returns N-values.

Dear @Javier_Galan,

I don’t think the code you show does what you expect it to. The Foreach operation triggers the execution of the event loop immediately (a so-called “instant action”). You cannot nest Define calls inside the event loop, since the computation graph is already being executed you are not able to create new nodes on the fly like that. What I can imagine would work for your usecase is something like the following (might still need a bit of adjusting)


auto df1 = df.Define("randomValues", [](){return GetRandom();});

auto apply_defines = [](ROOT::RDF::RNode df){
    for(size_t i = 0; i < fVariables.size(); ++i){
        df = df.Define(fVariables[i], [](const ROOT::RVecF &values){ return values[i];}, {"randomValues"});
    }
    return df;
}

auto df2 = apply_defines(df_1);

// Do what you need with the defined columns

This practically calls GetRandom() at each event, stores the result in a column. Then you can call Define multiple times with the right variable name and get the correct random value with the index. Notice I used const ROOT::RVecF & as the type for the value returned by GetRandom(), I think that should apply from the context of your snippet.

Cheers,
Vincenzo

Dear @vpadulan thanks for the detailed reply!

I managed to compile it after few add-ons, it required I added [this] and [&i] to the lambda functions inside my class method so that it could recognize my class members.

Actually GetRandom returns a std::vector<Double_t> do you recommend to change it to RVecD?

My method looks as follows now:

ROOT::RDF::RNode TRestComponent::GetMonteCarloDataFrame(Int_t N) {
    // Create a RDataFrame with the specified columns
    ROOT::RDF::RNode df = ROOT::RDataFrame(N);

    auto df1 = df.Define("randomValues", [this](){return GetRandom();});

    auto apply_defines = [this](ROOT::RDF::RNode df){
            for(size_t i = 0; i < fVariables.size(); ++i){
                        df = df.Define(fVariables[i], [&i](const std::vector<double> &values){ return values[i];}, {"randomValues"});
                            }
                return df;
    };

    auto df2 = apply_defines(df1);

    std::cout << df2.GetColumnNames().size() << std::endl;
    std::cout << *df2.Count() << std::endl;

    for (const auto& x : df2.GetColumnNames()) std::cout << x << std::endl;

    // Return the RNode (RDataFrame)
    return df2;
}

When I execute it I get the following seg.fault

root [1] comp.Initialize()
-- Info : Generating N-dim histogram for Flat6
root [2] auto df = comp.GetMonteCarloDataFrame(10)
4
10
final_energy
final_posX
final_posY
randomValues
(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void> &) @0x7f114d32de28
root [3] df.Display({"final_posX", "final_posY"} )->Print()

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f114c7f960c in waitpid () from /lib64/libc.so.6
#1  0x00007f114c776f62 in do_system () from /lib64/libc.so.6
#2  0x00007f114fb38295 in TUnixSystem::StackTrace() () from /nfs/dust/iaxo/group/software/root/6.26.02/install/lib/libCore.so
#3  0x00007f114fb355b5 in TUnixSystem::DispatchSignals(ESignals) () from /nfs/dust/iaxo/group/software/root/6.26.02/install/lib/libCore.so
#4  <signal handler called>
#5  0x00007f1150081fae in ROOT::Detail::RDF::RDefine<TRestComponent::GetMonteCarloDataFrame(int)::{lambda(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void>)#2}::operator()(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void>) const::{lambda(std::vector<double, std::allocator<double> > const&)#1}, ROOT::Detail::RDF::CustomColExtraArgs::None>::Update(unsigned int, long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#6  0x00007f115003da00 in ROOT::Internal::RDF::RDefineReader::GetImpl(long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#7  0x00007f114033f017 in ?? ()
#8  0x00000000079384d0 in ?? ()
#9  0x00007f1140342be7 in ?? ()
#10 0x0000000007d2a608 in ?? ()
#11 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007f1150081fae in ROOT::Detail::RDF::RDefine<TRestComponent::GetMonteCarloDataFrame(int)::{lambda(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void>)#2}::operator()(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void>) const::{lambda(std::vector<double, std::allocator<double> > const&)#1}, ROOT::Detail::RDF::CustomColExtraArgs::None>::Update(unsigned int, long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#6  0x00007f115003da00 in ROOT::Internal::RDF::RDefineReader::GetImpl(long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#7  0x00007f114033f017 in ?? ()
#8  0x00000000079384d0 in ?? ()
#9  0x00007f1140342be7 in ?? ()
#10 0x0000000007d2a608 in ?? ()
#11 0x0000000000000000 in ?? ()
===========================================================


Root >

For further info, if I only access the randomValues column, no problem appears. Only when I try to access final_posX or the other columns.

(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase, void> &) @0x7f4561e8ce28
root [3] df.Display({"randomValues"} )->Print()
+-----+-------------+
| Row | Rndm        |
+-----+-------------+
| 0   | 9.8325820   |
|     | 9.4565236   |
|     | 4.4736005   |
+-----+-------------+
| 1   | -5.3030053  |
|     | -6.6085046  |
|     | 5.8721527   |
+-----+-------------+
| 2   | 0.94799060  |
|     | -9.8480112  |
|     | 4.3293183   |
+-----+-------------+
| 3   | -3.6391194  |
|     | 1.3039344   |
|     | 3.5842862   |
+-----+-------------+
| 4   | -0.52153720 |
|     | 1.0443335   |
|     | 2.6065952   |
+-----+-------------+
root [4] df.Display({"final_posX"} )->Print()


 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f456135860c in waitpid () from /lib64/libc.so.6
#1  0x00007f45612d5f62 in do_system () from /lib64/libc.so.6
#2  0x00007f4564697295 in TUnixSystem::StackTrace() () from /nfs/dust/iaxo/group/software/root/6.26.02/install/lib/libCore.so
#3  0x00007f45646945b5 in TUnixSystem::DispatchSignals(ESignals) () from /nfs/dust/iaxo/group/software/root/6.26.02/install/lib/libCore.so
#4  <signal handler called>
#5  0x00007f4564be0e2e in ROOT::Detail::RDF::RDefine<TRestComponent::GetMonteCarloDataFrame(int)::{lambda(std::vector<double, std::allocator<double> > const&)#2}, ROOT::Detail::RDF::CustomColExtraArgs::None>::Update(unsigned int, long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#6  0x00007f4564b9c880 in ROOT::Internal::RDF::RDefineReader::GetImpl(long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#7  0x00007f4553e677a7 in ?? ()
#8  0x0000000006a81d40 in ?? ()
#9  0x00007f4553e69e02 in ?? ()
#10 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007f4564be0e2e in ROOT::Detail::RDF::RDefine<TRestComponent::GetMonteCarloDataFrame(int)::{lambda(std::vector<double, std::allocator<double> > const&)#2}, ROOT::Detail::RDF::CustomColExtraArgs::None>::Update(unsigned int, long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#6  0x00007f4564b9c880 in ROOT::Internal::RDF::RDefineReader::GetImpl(long long) () from /nfs/dust/iaxo/group/software/rest/beta/lib/libRestFramework.so
#7  0x00007f4553e677a7 in ?? ()
#8  0x0000000006a81d40 in ?? ()
#9  0x00007f4553e69e02 in ?? ()
#10 0x0000000000000000 in ?? ()
===========================================================


Root >
``

Dear @Javier_Galan ,

Without any further context I imagine there is something in the data type of the values in the final_posX column that is messing with the RDF event loop? The column name suggests it should be a simple type though so I am not sure. If the definition of the column is short and self-contained you could paste it here so we could take a look at it together. Otherwise we could attempt getting a more informative stacktrace, that would require building ROOT with debug symbols (e.g. with cmake -DCMAKE_BUILD_TYPE=Debug)

Cheers,
Vincenzo

It is finally working for me with the following implementation:

ROOT::RDF::RNode TRestComponent::GetMonteCarloDataFrame(Int_t N) {
    ROOT::RDF::RNode df = ROOT::RDataFrame(N);

    // Function to fill the RDataFrame using GetRandom method
    auto fillRndm = [&]() {
        ROOT::RVecD randomValues = GetRandom();
        return randomValues;
    };
    df = df.Define("Rndm", fillRndm);

    // Creating dedicated columns for each GetRandom component
    for (size_t i = 0; i < fVariables.size(); ++i) {
        auto varName = fVariables[i];
        auto FillRand = [i](const ROOT::RVecD& randomValues) { return randomValues[i]; };
        df = df.Define(varName, FillRand, {"Rndm"});
    }

    return df;
}

Thanks!

Dear @Javier_Galan ,

I am glad to hear that you have reached a working solution. For my curiosity, by looking at your latest post it is not immediately clear what was the difference w.r.t. my suggestion that produced a working application. Is it the explicit cast to ROOT::RVecD (in which case I would like to understand what was the previous data type of that column) or something else that I cannot grasp? Thanks a lot!
Cheers,
Vincenzo

I replaced any ROOT::RVecD occurrence in my working code by std::vector <double>. Then, the code was still running smooth.

It took me sometime to recover back the error!

Actually, I think, the code I posted where I had problems was due to the following line:

If I replace that line by

    df = df.Define("randomValues", [this](){return GetRandom();});

and then I call

auto df2 = apply_defines(df);

It solves the problem. Does it make sense?

Working version:

Not working version:

I would like to understand what is going on there, to try to do not fall again in the same error.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.