RDataFrame: Define: order of columns

ROOT Version: 6.32.08
Platform: linuxx8664gcc
Compiler: c++ (Ubuntu 13.3.0-6ubuntu2~24.04)


Hello,

RDataFrame doesn’t seem to keep the order of Define().

Here is a MWE:
print.C (213 Bytes)

void print(const string& name_1, const string& name_2) {
    ROOT::RDataFrame empty(0);
    auto rdf = empty.Define(name_1, [](){return 0;})
        .Define(name_2, [](){return 0;});
    rdf.Describe().Print();
}

Test 1: a b

$ root -q "print.C(\"a\", \"b\")"
   ------------------------------------------------------------------
  | Welcome to ROOT 6.32.08                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 24 2025, 09:37:12                 |
  | From tags/v6-32-08@v6-32-08                                      |
  | With c++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0                   |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------


Processing print.C("a", "b")...
Empty dataframe filling 0 rows

Property                Value
--------                -----
Columns in total            2
Columns from defines        2
Event loops run             0
Processing slots            1

Column  Type    Origin
------  ----    ------
a       int     Define
b       int     Define

Test 2 : b a

$ root -q "print.C(\"b\", \"a\")"
   ------------------------------------------------------------------
  | Welcome to ROOT 6.32.08                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 24 2025, 09:37:12                 |
  | From tags/v6-32-08@v6-32-08                                      |
  | With c++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0                   |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------


Processing print.C("b", "a")...
Empty dataframe filling 0 rows

Property                Value
--------                -----
Columns in total            2
Columns from defines        2
Event loops run             0
Processing slots            1

Column  Type    Origin
------  ----    ------
a       int     Define
b       int     Define

How can it keep the order of Define() ?

Best Regards,

Hello @Salomon!

RDataFrame doesn’t observe the order of Defines at all, but this has to do with how values are passed between nodes of the computation graph.
When you Define, you assign a column name to some kind of function that produces a value. Just the fact that you defined it doesn’t mean, however, that the function will actually run. In order to run the function, you have to pass it into an action:
When you use the column, you typically do rdf.Action(xxx, {"a", "b"});. Given that you pass the column by name, it doesn’t matter in which order they were defined. RDataFrame will evaluate a first and b second to supply values to this action. If you do rdf.Action(xxx, {"b", "a"});, they run in inverse order, irrespective of how they were defined.

When you use Describe, you get an alphabetical list of columns, which is independent of how you defined them.