Taking branches by reference with RDataFrame

I’ve been taking branches by reference (const struct_type&) in my Define's and Filter's. As part of trying to write out my output tree using Book (as I unwrap the data into many columns, and write multiple trees with different selections), I discovered that if I give Book explicit template arguments, and some of the column types are references, I get a compile error about creating a pointer to a reference.

Is it not actually possible to take branches by reference? In that case, what is happening with Define and Filter, where I don’t give the functions explicit template arguments?

I can’t leave the template arguments implicit with Book since it then tries to instantiate my Exec template with an empty argument list.

It’s perfectly fine to take column contents by reference in Filters and Defines. It should be ok with Book too.

Can you provide the smallest reproducer so we can debug it? Note that you can use std::decay_t to convert a template type that’s been inferred as a reference to its corresponding non-reference unqualified type.

Inference doesn’t work at all, probably because my Exec function is a template. At compile time it tries to instantiate with Args = {}.

The location of the pointer error is in RDF code, not my code.

I’ll see what I can do about a minimal working example.

Hi Beojan,
Book does not infer the type of the column arguments from the signature of Exec: it takes them as template parameters. If you don’t pass any template parameter to Book like in e.g. df.Book<double, int>(...), it will invoke Exec with no parameters.

The location of the pointer error is in RDF code, not my code.

Yes, that’s where the check happens, or the usage of the mismatched type.

I’ll see what I can do about a minimal working example.

A not working example would be perfect. With a little reproducer of the problem it will be easier to spot the bug.


Before I can do that, I have a different issue. InitTask gets called with a null pointer (which is fine, since I don’t do anything with the TTreeReader), then immediately after each of these calls, I get the error:

Error in <TTreeReaderValue::Get()>: Value reader not properly initialized, did you remember to call TTreeReader.Set(Next)Entry()?

Uhm that’s weird: if you are reading data from a ROOT file, the TTreeReader* passed to InitTask should not be null. If you are not, there should not be TTreeReaderValues involved.

I’m afraid I can’t help more without some code that I can check/run/debug.

(if you just copy-paste one of the action helpers that are present in our tutorials, e.g. in $ROOTSYS/tutorials/dataframe/df022_useKahan.C and use that helper, does everything work fine? If not there is some other problem that is not related to action helprs)

Some of the branches in question are from the ROOT file, and some are Defined, so maybe that’s causing an issue?

Though there are intermediate Filters so I don’t know how that would interact.

EDIT: Probably not it. I tried with only one branch, which is straight from the ROOT file and I get the same error.

If there is a ROOT file involved in the event loop, there should be a TTreeReader.
Something is wrong, but I can’t tell what.

Cache, maybe? I cache a limited set of branches just before calling Book.
I tried calling Book on the frame before the cache, and it works, so it looks like there’s a problem running custom actions on cached dataframes.

EDIT: Yes, this is definitely the cause.

Ah, ok. Cache returns a completely new RDataFrame which does not read from the ROOT file anymore (it reads from the in-memory caches). So that explains the TTreeReader nullptr.

It looks like we have a bug in Book'ed actions that depend on Cache'd dataframes. I opened a jira ticket, you can follow development on this problem there.

Regarding your original issue about column type inference, the quickest way to make progress would be to have a minimal reproducer.

Thank you for reporting this problem with Cache+Book!

P.S. what ROOT version are you on?

6.16, though ideally I need it to work with 6.14 (LCG 94) as well.

6.14 is very old in RDF years, I am not even sure if it has Book.
But I’ll definitely backport the fix to v6.16.

This should fix the problem with Cache+Book.

OK, here’s the example: https://gist.github.com/beojan/271882f753c3183eaf9d40c89c221cc0

If you turn the reference on line 45 in main.cpp into a value, it compiles and runs.

NB: You need fmt installed to run this.

Hi Beojan,
the template parameters passed to Book should correspond to the type of the column values, and the column has type std::vector<int>. We do not do an implicit decay of the template parameter type.
So std::vector<int> is indeed the correct type to pass there.

OutputTree can keep taking the values by const reference though, as you saw.


I see. Does this mean the column values may get copied when being passed to OutputTree, or do they all get converted to forwarding references internally?

Also, maybe this could be clarified in the Book documentation?

The documentation also doesn’t mention that the GetActionName function is mandatory.

RDF internals keep copies to a minimum (zero, in most cases).
If those values come from a TTree, what is passed to the function is *readerValue where readerValue is a TTreeReaderValue<std::vector<int>>.
If the column comes from a Define or a RDataSource, its values are stored somewhere, and what’s passed to the user-defined function is *valuePtr where valuePtr contains the address of the relevant value.

A copy-assignment is performed when a Define'd column is evaluated, to store in in RDF’s internal variable:

definedValue = userFunction(args...)

Thanks for the help. I would expect even that copy would be elided in most cases, assuming C++ 17.

It’s a copy-assignment, not a copy-construction, so I don’t think copy elision kicks in, but move semantics do, so you could get a move-assignment if the return type of userFunction allows it.

About the docs: I agree they are not up to date (PRs are welcome :slight_smile:) – I’ll add it to my to-do list.