RDataFrame tree friends branches with identical variable name but different types

Hi all,

The simplest example I could create which represents the problem is the following:

import ROOT

vectors = '''
#include "Math/Vector3D.h"
#include "Math/Vector4D.h"
#include "TFile.h"
#include "TTree.h"
#include <vector>
using namespace ROOT::Math;
using namespace ROOT::VecOps;

RVec <double> getArrX(const RVec<XYZTVector>& vec){
    auto getItemX = [](const XYZTVector& item) { return item.X(); };
    return Map(vec, getItemX);
}


'''

ROOT.gInterpreter.Declare(vectors)

if __name__ == "__main__":
  ROOT.RDataFrame(1).Define("vec", "XYZVector(10, 10, 10)").Snapshot("Particle", "f1.root")
  ROOT.RDataFrame(1).Define("vec", "XYZVector(20, 20, 20)").Snapshot("Cluster", "f2.root")
  ROOT.RDataFrame(1).Define("vec", "std::vector<XYZTVector>{XYZTVector(30, 30, 30, 30)}").Snapshot("Vertex", "f3.root")

  ch1 = ROOT.TChain("Particle")
  ch1.Add("f1.root")
  ch2 = ROOT.TChain("Cluster")
  ch2.Add("f2.root")
  ch3 = ROOT.TChain("Vertex")
  ch3.Add("f3.root")

  ch1.AddFriend(ch2, "cluster")
  ch1.AddFriend(ch3, "vertex")

  df = ROOT.RDataFrame(ch1)
  print(df.Define("particle_z", "vec.Z()").Histo1D("particle_z").GetMean())
  print(df.Define("cluster_z", "cluster.vec.Z()").Histo1D("cluster_z").GetMean())
  print(df.Define("vertex_z", "getArrX(vertex.vec)").Histo1D("vertex_z").GetMean())

I would expect it to output:

10
20
30

However the error occurs:

$ python test.py 
10.0
20.0
input_line_99:2:142: error: no matching function for call to 'getArrX'
auto lambda4 = [](ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>& var0){return getArrX(var0)
                                                                                                                                             ^~~~~~~
input_line_42:10:15: note: candidate function not viable: no known conversion from 'ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>, ROOT::Math::DefaultCoordinateSystemTag>' (aka
      'DisplacementVector3D<Cartesian3D<double>, ROOT::Math::DefaultCoordinateSystemTag>') to 'const RVec<ROOT::Math::XYZTVector>' (aka 'const RVec<LorentzVector<PxPyPzE4D<double> > >') for 1st argument
RVec <double> getArrX(const RVec<XYZTVector>& vec){
              ^

What I have tried so far:

  1. Commenting “Cluster” friend tree results in the correct output
$python test.py
10.0
30.0

meaning this is something to do with identical branch names

  1. I have tried to make more simple example with int, vector<int> instead of XYZVector, vector<XYZTVector> and everything seem to work

I spent 3 days understanding what is wrong.
Why is this happening no matter different tree names/tree aliases/file names…

cheers,
Bohdan

ROOT Version: 6.22/00
Platform: CentOS 7.9.2009
Python: 3.7.6


Hi @FoxWise,
looks like a bug, sorry about that! Somehow the presence of an identically-named branch, for this particular combination of types, makes RDataFrame think that vertex.vec is a ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag> instead of a RVec<ROOT::Math::XYZTVector>.

Unfortunately it looks like we don’t have a test for this particular scenario.

  1. what does tree->Print() say about the type of vertex.vec in tree “Vertex”?
  2. what does df.GetColumnType("vertex.vec") return with and without adding “cluster” as a friend?

Cheers,
Enrico

Hi @eguiraud,

  1. Here is the output. I am not sure how to interpret this, so I leave it for you, sorry.
root [0] TFile file("f3.root")
(TFile &) Name: f3.root Title: 

root [1] TTree* t = (TTree*)file.Get("Vertex")
(TTree *) 0x3eed090
root [2] t->Print()
******************************************************************************
*Tree    :Vertex    : Vertex                                                 *
*Entries :        1 : Total =            4106 bytes  File  Size =       1292 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :vec       : Int_t vec_                                             *
*Entries :        1 : Total  Size=       3744 bytes  File Size  =         88 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :vec.fCoordinates.fX : Double_t fX[vec_]                            *
*Entries :        1 : Total  Size=        699 bytes  File Size  =        108 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :vec.fCoordinates.fY : Double_t fY[vec_]                            *
*Entries :        1 : Total  Size=        699 bytes  File Size  =        108 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :vec.fCoordinates.fZ : Double_t fZ[vec_]                            *
*Entries :        1 : Total  Size=        699 bytes  File Size  =        108 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    4 :vec.fCoordinates.fT : Double_t fT[vec_]                            *
*Entries :        1 : Total  Size=        699 bytes  File Size  =        108 *
*Baskets :        1 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

  1. df.GetColumnType("vertex.vec") output with “cluster” tree:
ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag>
  1. df.GetColumnType("vertex.vec") output without “cluster” tree:
ROOT::VecOps::RVec<ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> >>

Yep, 2. and 3. clearly show that the addition of the friend confuses RDF about the type of the branch. Really bizarre bug :confused:

Can you please open a GitHub issue with your self-contained repro? I will take a look as soon as possible.

Cheers,
Enrico

Have you tried adding the friend with an alias?
I.e AddFriend(“myaliasFriend = ActualTTreeName”, “file.root”)
I did like this when i have same names of branches.
The RdataFrame then recognize
“myalias.branch” as a unique column in principle.

Hi @RENATO_QUAGLIANI

I have just tried to replace

  ch1 = ROOT.TChain("Particle")
  ch1.Add("f1.root")
  ch2 = ROOT.TChain("Cluster")
  ch2.Add("f2.root")
  ch3 = ROOT.TChain("Vertex")
  ch3.Add("f3.root")

  ch1.AddFriend(ch2, "cluster")
  ch1.AddFriend(ch3, "vertex")

part as you proposed with

  ch1 = ROOT.TChain("Particle")
  ch1.Add("f1.root")
  ch1.AddFriend("cluster = Cluster", "f2.root")
  ch1.AddFriend("vertex = Vertex", "f3.root")

Unfortunately the same error is still persists.

But thanks for you comment!
I was unaware of this AddFriend syntax. It makes code much shorter and more nice to read :slightly_smiling_face:

cheers,
Bohdan

This is now https://github.com/root-project/root/issues/6944 (thanks @FoxWise for reporting). Let’s continue discussion there – I’ll take a look as soon as possible, probably end of this week.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.