Question about Aliases in RDataFrame

I have a question about the behavior of Aliases in RDataFrames. Naively I would have expected, that an Alias is really just a different name for a column or in our case a branch. However, from the following minimal reproducer that does not seem to be the case:

We store vector<ObjectID> that look like the following in some of our branches

struct ObjectID {
  ObjectID() = default;
  ObjectID(int c, int i) : collID(c), index(i) {}
  int collID{};
  int index{};
};

Consider a Tree that has been filled with vector<ObjectID> in the following way

import ROOT
ROOT.gInterpreter.GenerateDictionary("ObjectID;std::vector<ObjectID>", "ObjectID.h")
ROOT.gInterpreter.Declare('''
#include "ObjectID.h"

void fillTree(const char *tn, const char *fn) {
  ROOT::RDataFrame df(10);

  df.Define("ObjectIDs", []() {
      constexpr auto N = 3;
      std::vector<ObjectID> ids;
      ids.reserve(N);
      for (int i = 0; i < N; ++i) {
        ids.emplace_back(i, i*i);
      }

      return ids;
    }).Snapshot<std::vector<ObjectID>>(tn, fn, {"ObjectIDs"});
}
''')


ROOT.fillTree("example_tree", "example_df.root")

With this the following works

df = ROOT.ROOT.RDataFrame("example_tree", "example_df.root")
df.Define("idx", "ObjectIDs.index")

However, if I first make an alias and then try to do the same:

df = ROOT.ROOT.RDataFrame("example_tree", "example_df.root")
df.Alias("ids", "ObjectIDs").Define("idx", "ids.index")

This breaks down with a slightly lengthy error message (see blow). The gist of the error seems to be that in the first case RDataFrame somehow manages to resolve the (sub)branches as RVec<int>, but as soon as an alias is involved it seemingly is accessed as an RVec<ObjectID>, which obviously does not have an index field. I suppose this is expected behavior? Is it in any way possible to make aliases behave more like aliases in this case?

Full error message ```console input_line_67:2:67: error: no member named 'index' in 'ROOT::VecOps::RVec' auto lambda0 = [](ROOT::VecOps::RVec& var0){return var0.index ~~~~ ^ In file included from /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/input_line_9:15: In file included from /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/etc/dictpch/allHeaders.h:806: /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1452:97: error: no member named 'value' in 'ROOT::Internal::VecOps::RVecInlineStorageSize' class R__CLING_PTRCHECK(off) RVec : public RVecN<T, Internal::VecOps::RVecInlineStorageSize::value> { ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ input_line_71:2:66: note: in instantiation of template class 'ROOT::VecOps::RVec' requested here auto lambda0 = [](ROOT::VecOps::RVec& var0){return var0.index ^ In file included from /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/input_line_9:15: In file included from /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/etc/dictpch/allHeaders.h:806: /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1453:76: error: no member named 'value' in 'ROOT::Internal::VecOps::RVecInlineStorageSize' using SuperClass = RVecN<T, Internal::VecOps::RVecInlineStorageSize::value>; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1522:38: error: no function template matches function template specialization 'IsSmall' friend bool ROOT::Detail::VecOps::IsSmall(const RVec &v); ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1089:6: note: candidate template ignored: failed template argument deduction bool IsSmall(const ROOT::VecOps::RVec &v) ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1522:38: error: friend declaration of 'IsSmall' does not match any declaration in namespace 'ROOT::Detail::VecOps' friend bool ROOT::Detail::VecOps::IsSmall(const RVec &v); ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1089:6: note: candidate template ignored: failed template argument deduction bool IsSmall(const ROOT::VecOps::RVec &v) ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1524:38: error: no function template matches function template specialization 'IsAdopting' friend bool ROOT::Detail::VecOps::IsAdopting(const RVec &v); ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1095:6: note: candidate template ignored: failed template argument deduction bool IsAdopting(const ROOT::VecOps::RVec &v) ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1524:38: error: friend declaration of 'IsAdopting' does not match any declaration in namespace 'ROOT::Detail::VecOps' friend bool ROOT::Detail::VecOps::IsAdopting(const RVec &v); ^ /tmp/tmadlener/spack-stage/spack-stage-root-6.26.04-mxi2b3znosaooegjxfathocnqbawwvm2/spack-build-mxi2b3z/include/ROOT/RVec.hxx:1095:6: note: candidate template ignored: failed template argument deduction bool IsAdopting(const ROOT::VecOps::RVec &v) ^ Traceback (most recent call last): File "reproducer.py", line 32, in df.Alias("ids", "ObjectIDs").Define("idx", "ids.index").Display(["idx"]).Print() cppyy.gbl.std.runtime_error: Template method resolution failed: ROOT::RDF::RInterface ROOT::RDF::RInterface::Define(basic_string_view<char,char_traits > name, basic_string_view<char,char_traits > expression) => runtime_error: RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash All RDF objects that have not run an event loop yet should be considered in an invalid state.

ROOT::RDF::RInterfaceROOT::Detail::RDF::RLoopManager,void ROOT::RDF::RInterfaceROOT::Detail::RDF::RLoopManager,void::Define(basic_string_view<char,char_traits > name, basic_string_view<char,char_traits > expression) =>
runtime_error:
RDataFrame: An error occurred during just-in-time compilation. The lines above might indicate the cause of the crash
All RDF objects that have not run an event loop yet should be considered in an invalid state.

</details>

Hi @tmadlener ,

thank you for your report, the problem is indeed what you hint at – we decide which columns are used in the expression and substitute them with placeholder variable names before aliases are resolved. I can reproduce the problem and I agree we can/should do better, I’ll look into this as soon as possible.

Cheers,
Enrico

1 Like

Hi Enrico,

I have another question that I think is closely related to this one (mainly because the error is very similar).
The issue is with indexing into a Branch with sub-branches. I can index on the sub-branches directly, but indexing on the “top-level” branch first and then trying to access the sub-branches breaks again.

Adding a function to simply generate a random index to the reproducer above like

ROOT.gIntepreter.Declare('''
ROOT::VecOps::RVec<int> randomSelect(int n) {
  ROOT::VecOps::RVec<int> select;
  select.reserve(n);

  auto rng = TRandom3(0);

  for (int i = 0; i < n; ++i) {
    select.emplace_back(int(rng.Uniform() < 0.5));
  }
  return select;
}
''')

and then adapting the reproducer to

ROOT.fillTree("example_tree", "example_df.root")

df = ROOT.ROOT.RDataFrame("example_tree", "example_df.root")
df = df.Define("sel", "randomSelect(ObjectIDs.size())")

Selecting on the sub branches with this index works:

df.Define("sel_idx", "ObjectIDs.index[sel]").Display(["sel_idx"]).Print()

However, similar to the original question, indexing first and then trying to access the sub branches breaks again

df.Define("sel_ids", "ObjecIDs[sel]").Define("sel_idx", "sel_ids.index").Display(["sel_idx"]).Print()

Again, the error is indicating that in this case it cannot see through to the sub-branches, but instead tries to work on the RVec<ObjectID> directly

input_line_84:2:67: error: no member named 'index' in 'ROOT::VecOps::RVec<ObjectID>'
auto lambda3 = [](ROOT::VecOps::RVec<ObjectID>& var0){return var0.index
                                                             ~~~~ ^
input_line_88:2:67: error: no member named 'index' in 'ROOT::VecOps::RVec<ObjectID>'
auto lambda3 = [](ROOT::VecOps::RVec<ObjectID>& var0){return var0.index
                                                             ~~~~ ^
input_line_89:2:67: error: no member named 'index' in 'ROOT::VecOps::RVec<ObjectID>'
auto lambda3 = [](ROOT::VecOps::RVec<ObjectID>& var0){return var0.index
                                                             ~~~~ ^

All of this can technically be worked around by writing some c++ functions and then calling them from the RDataFrame, but it would be nice if it just worked “out of the box”.

Cheers,
Thomas

Hi @tmadlener ,

this is a different problem (unfortunately :smiley: ). In your reproducer, in RDF ObjectIDs is of type RVec<ObjectID>, so sel_ids is also of type RVec<ObjectID> and that type does not have an index data member.

That’s very different from accessing branch ObjectIds.index, which in RDF is of type RVec<int> and therefore has an operator[].

I hope that clarifies things.
Cheers,
Enrico

BTW the original issue is now [DF] Bad interaction between `Alias` and TTree sub-branches · Issue #11207 · root-project/root · GitHub – work in progress :smiley: .

1 Like

Ah, I see. Thanks for the clarification. Then I was just conflating these two things on my end, because they seemed to be very similar.

For the second case I suppose there is no easy way for RDataFrame to be able to access public data members of pods as I have tried to do it above? That would simplify a few things for me and I would not have to write as many c++ wrappers :wink:

Cheers,
Thomas

As above, the way to do it is with ObjectIDs.index[sel] (so accessing the array of data members) rather than ObjectIDs[sel].index (which accessed the array of objects, selects some elements, and then calls .index on the array of selected objects). You can also write the latter as Map(ObjectIDs, [](const auto &obj) { return obj.index; }).

Cheers,
Enrico

The original issue is now fixed in master and v6-26-00-patches (aka future 6.26.08).

Thank you for the report!
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.