RDataFrame with custom object branch


Can I use RDataFrame to interact with objects I’ve put in a branch of a TTree?

Suppose I have a class (Event) deriving from TObject, and a branch (“eventObj”) in the tree holding an instance of this class. How can I, for example, do a lambda function filter using this branch? I want something like:

auto cut1 = [](const Event* ev) { ... do stuff with ev; }

But when I try to do this I get ‘CreateProxy’ errors about the dictionary not existing. But I know this class is fine for e.g. doing tree->Scan( eventObj->blah() ) for example



Can I use RDataFrame to interact with objects I’ve put in a branch of a TTree?

Yes! For non-trivial classes, ROOT dictionaries must be available so that ROOT knows how to correctly read and write the class. If tree->Scan does not complain, dictionaries must be there somewhere, and you have to load them into your compiled code.

A small self-contained example of RDataFrame reading/writing a custom type:

// Event.h
#ifndef EVENT_H
#define EVENT_H

#include <TObject.h>

class Event : public TObject {
   int x = 42;
   Event() {}
   int GetX() const { return x; }
   ClassDef(Event, 0);

#include "Event.h"
#include <ROOT/RDataFrame.hxx>
#include <iostream>

int main() {
   // writing an Event
      .Define("event", [] { return Event(); })
      .Snapshot("tree", "f.root");

   // reading an Event
   const auto entries = ROOT::RDataFrame("tree", "f.root")
      .Filter([] (const Event& evt) { return evt.GetX() == 42; }, {"event"})

   std::cout << entries << std::endl; // will print 3

   return 0;


$ rootcling event_dict.cxx Event.h # create dictionaries for class Event
$ g++ -o main main.cxx event_dict.cxx $(root-config --libs --cflags) # compile (passing the dictionaries)
$ ./main # prints 3


Thanks. Does doing this sort of thing not work from the ROOT (cling) prompt then? Must everything be precompiled?

I was showing the complicated case, it’s actually easier from the prompt. I had to add include guards in file Event.h above, and then:

$ root -l                                                                                                                                                      (cern-root) 
root [0] .L Event.h+                                                                                                                                                          
Info in <TUnixSystem::ACLiC>: creating shared library /tmp/./Event_h.so                                                                                                       
root [1] .L main.cxx                                                                                                                                                         
root [2] main()                                                                                                                                                               
(int) 0                                                                                                                                                                       

.L Event.h+ creates the dictionaries and the interpreter looks for dictionaries in the current directory, so it finds them later when executing main().

My point is: RDataFrame works fine with custom objects, but dictionaries need to be available as per the error message. If TTree::Scan does not error out, that makes me think the dictionaries are there, somewhere, and you have to have the application pick them up.


Ok I must still be doing something wrong here, can you see why I have troubles in this example: https://cernbox.cern.ch/index.php/s/1PDYLcjDp35XCO6

I just want a lambda function that will take in that example root class (a RooFitResult in this case) …

Hi @will_cern,
thank you for the reproducer.


bool isNull(void* p) { return p==nullptr; }

@moneta or @pcanal can comment with more authority but I don’t think TTreeFormula supports computations with arbitrary types (such as void*, for example).

About using RDataFrame, the problem is that RDF really does not have good support for nullptr values. If you remove the second entry, the one with the nullptr, this works fine:

d.Foreach([](RooFitResult& b) { std::cout << b.GetName() << std::endl; }, {"fr"});

If you had another column that indicated presence/absence of a RooFitResult, you could work around this limitation of RDF by prepending a Filter([](bool isNull) { return !isNull; }, {"is_roofitresult_null"}) to the Foreach, so that the Foreach only processes events with non-null RooFitResults.


Ah thanks I think that might have been how I was getting stuck. I think that’s this problem solved.

I’m also assuming that all of this will be considered thread safe (i.e. that concurrent calls to my lambda will get references to different objects here.

I’m trying to learn what I can do with RDataFrame having come from my previous experience a few years ago of extensive advanced use of TTree::Draw/Scan trickery. I feel like at some point I used to be able to call methods on objects as part of a Draw/Scan, but it’s been a while and I seem to have forgot. But at least RDataFrame can use these lambdas ok then.

Will probably be back soon with more questions… thanks

Your lambdas must be safe to call concurrently when multi-threading is activated – but indeed, they will definitely get inputs corresponding to different events when called concurrently.

Please do! Actually, this is too good of a thread for it to stay in the Newbie section and get deleted in a couple of weeks. Would you be ok with promoting it to the ROOT section?

be my guest

1 Like

That is correct. The arguments to function call by TTree::Draw can only be simple numeric type. (But you can call data member function of complex objects as long as they take no argument or simple numerical argument).

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.