AsNumpy not working with . Array Branches?

Dear experts , i have simple python script where i do :

import ROOT as r
from ROOT import RDataFrame as RDF
import numpy as np
df = RDF("toyTracks","ALLMERGED_fullP.root")
df_process = df.AsNumpy(columns=["txO", "tyO", "p","qop" ,"nHits","x_SciFis","y_SciFis" , "z_SciFis"])

When i look to the TFile content and i Show() branches i have

root [1] toyTracks->Show(1)
======> EVENT:1
 eta             = 1.6
 phi             = -3.14159
 pt              = 1058.12
 txO             = -0.420952
 tyO             = -5.15517e-17
 p               = 2727.27
 qop             = 0.000366667
 mass            = 493.68
 nHits           = 12
 x_SciFis        = -2101.21,
                  -2101.47, -2101.6, -2101.61, -2099.06, -2098.36,
                  -2097.59, -2096.75, -2089.67, -2088.46, -2087.21,
                  -2085.93
 y_SciFis        = 0.554104,
                  0.56029, 0.566658, 0.573208, 0.622267, 0.630302,
                  0.638538, 0.646973, 0.70872, 0.718357, 0.728107,
                  0.737964
 z_SciFis        = 7826,
                  7896, 7966, 8036, 8508, 8578,
                  8648, 8718, 9193, 9263, 9333,
                  9403
 tx_SciFis       = -0.0047049,
                  -0.00281031, -0.00102809, 0.000646468, 0.00952067, 0.0105347,
                  0.011484, 0.0123734, 0.0170837, 0.017614, 0.0181096,
                  0.0185732
 ty_SciFis       = 8.70786e-05,
                  8.96707e-05, 9.22753e-05, 9.48698e-05, 0.000113353, 0.000116228,
                  0.000119085, 0.000121901, 0.000136828, 0.000138495, 0.000140063,
                  0.000141537

and

*............................................................................*
*Br    8 :nHits     : nHits/I                                                *
*Entries :  3842054 : Total  Size=   15380165 bytes  File Size  =      88173 *
*Baskets :      120 : Basket Size=     605184 bytes  Compression= 174.40     *
*............................................................................*
*Br    9 :xSciFi    : x_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =   87312718 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.29     *
*............................................................................*
*Br   10 :ySciFi    : y_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =   79835379 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.50     *
*............................................................................*
*Br   11 :zSciFi    : z_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =    6827258 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=  29.28     *
*............................................................................*
*Br   12 :txSciFi   : tx_SciFis[nHits]/F                                     *
*Entries :  3842054 : Total  Size=  199927606 bytes  File Size  =   87489485 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.28     *
*............................................................................*
*Br   13 :tySciFi   : ty_SciFis[nHits]/F                                     *
*Entries :  3842054 : Total  Size=  199927606 bytes  File Size  =   74680435 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.68     *
*............................................................................*

I generated thosee tuples using the suggested approach in
https://root.cern.ch/how/how-write-ttree-python

Thanks for any advice.
Renato


ROOT Version: 6.18/04
Built for macosx64 on Sep 26 2019, 09:45:39
From tags/v6-18-04@v6-18-04


Hi,
what’s the problem exactly (e.g. error message, expected vs obtained output)?

Cheers,
Enrico

This is what Python tells me @eguiraud

    result_ptrs[column] = _root.ROOT.Internal.RDF.RDataFrameTake(column_type)(df_rnode, column)
Exception: ROOT::RDF::RResultPtr<vector<ROOT::VecOps::RVec<float> > > ROOT::Internal::RDF::RDataFrameTake<ROOT::VecOps::RVec<float> >(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase,void> df, basic_string_view<char,char_traits<char> > column) =>
    Unknown column: x_SciFis (C++ exception of type runtime_error)

My code :

df = RDF( "toyTracks", "reduced.root")
df_numpy = df.Range(100).AsNumpy( columns = ["nHits","x_SciFis"]  )

The tuple , in . attachment
reduced.root (13.3 KB)

@eguiraud do you need any extra input?

Hi Renato,
I will be off for some time. Probably @swunsch or @Axel or @etejedor can help (maybe when CHEP is over).

Can you do anything with column x_SciFis? The error is “unknown column”, so it seems like RDF does not see/recognize that column/branch.

Cheers,
Enrico

No, i cannot do anything with column x_SciFis except
TTree::Draw(“x_SciFis[0]”)… etc… but nothing with RDataFrame.

In the past . i saw that storing branches as vector<float> was working fine and RDataFrame can read it back, here i have float[12]. At ntuple production time I used float[nHits] where nHits also saved as branch, as . suggeested in https://root.cern.ch/how/how-write-ttree-python

RDataFrame supports float[N], so there might be something fishy going on with this file (e.g. some edge case or combination of features that we do not handle correctly in RDataFrame).

Since you shared the offending file above, I think it’s on the ROOT team to investigate further.

Cheers,
Enrico

I don’t know why i have this nasty . problem, but i made a workaround in tuple production :
I would recommend this meethod in the webpage when storing arrays. At the end the overall API is ecxactly the same , and all the AsNumpy is working fine.

	z_HITS = ROOT.std.vector('float')(12)

	ttree._z_HITS = z_HITS
	ttree.Branch("z_HITS",z_HITS) #, "z_HITS[12]/F)

	for i in range(100):
		z_HITS.clear()
		for idx, z in enumerate(z_SciFiModules) : 
			z_HITS.push_back( z)
		ttree.Fill()
	f.Write()

How did you write them out before? Could you point us to the code?

Edit: Ok sorry, you already did :slight_smile: Thanks!

So, I think the problem comes up because you name the branch different than the leaf. Not the missing underscore for the leaflist tx_SciFis[nHits]/F compare to the branchname txSciFi:

*............................................................................*
*Br   12 :txSciFi   : tx_SciFis[nHits]/F                                     *
*Entries :      100 : Total  Size=       5877 bytes  File Size  =       1242 *
*Baskets :        1 : Basket Size=   25600000 bytes  Compression=   4.26     *
*............................................................................*

I’ll investigate further and make a proper bug report out of it.

Edit: Actually, the branch is there, however, due to the usage of the leaflist, the names differ. See following code.

import ROOT
df = ROOT.RDataFrame("toyTracks", "reduced.root")
print(df.GetColumnNames())
{ "eta", "phi", "pt", "txO", "tyO", "p", "qop", "mass", "nHits", "xSciFi.x_SciFis", "ySciFi.y_SciFis", "zSciFi.z_SciFis", "txSciFi.tx_SciFis", "tySciFi.ty_SciFis" }

Still, this is a bug that the different naming breaks your workflow because it segfaults if you put xSciFi.x_SciFis in AsNumpy.

There we go: https://sft.its.cern.ch/jira/browse/ROOT-10397

Let’s move the discussion to JIRA and follow up the issue there.

Many thanks Renato for reporting!

Hi @swunsch, yes i had some typos in tuple production :

tree.Branch("x_SciFi", x_SciFis,"x_SciFis[nLay]")  

Tough, i tried to correect for the names and i still don’t get AsNumpy working, this is why i moved to vectors.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.