AsNumpy not working with . Array Branches?

RENATO_QUAGLIANI · October 30, 2019, 12:31pm

Dear experts , i have simple python script where i do :

import ROOT as r
from ROOT import RDataFrame as RDF
import numpy as np
df = RDF("toyTracks","ALLMERGED_fullP.root")
df_process = df.AsNumpy(columns=["txO", "tyO", "p","qop" ,"nHits","x_SciFis","y_SciFis" , "z_SciFis"])

When i look to the TFile content and i Show() branches i have

root [1] toyTracks->Show(1)
======> EVENT:1
 eta             = 1.6
 phi             = -3.14159
 pt              = 1058.12
 txO             = -0.420952
 tyO             = -5.15517e-17
 p               = 2727.27
 qop             = 0.000366667
 mass            = 493.68
 nHits           = 12
 x_SciFis        = -2101.21,
                  -2101.47, -2101.6, -2101.61, -2099.06, -2098.36,
                  -2097.59, -2096.75, -2089.67, -2088.46, -2087.21,
                  -2085.93
 y_SciFis        = 0.554104,
                  0.56029, 0.566658, 0.573208, 0.622267, 0.630302,
                  0.638538, 0.646973, 0.70872, 0.718357, 0.728107,
                  0.737964
 z_SciFis        = 7826,
                  7896, 7966, 8036, 8508, 8578,
                  8648, 8718, 9193, 9263, 9333,
                  9403
 tx_SciFis       = -0.0047049,
                  -0.00281031, -0.00102809, 0.000646468, 0.00952067, 0.0105347,
                  0.011484, 0.0123734, 0.0170837, 0.017614, 0.0181096,
                  0.0185732
 ty_SciFis       = 8.70786e-05,
                  8.96707e-05, 9.22753e-05, 9.48698e-05, 0.000113353, 0.000116228,
                  0.000119085, 0.000121901, 0.000136828, 0.000138495, 0.000140063,
                  0.000141537

and

*............................................................................*
*Br    8 :nHits     : nHits/I                                                *
*Entries :  3842054 : Total  Size=   15380165 bytes  File Size  =      88173 *
*Baskets :      120 : Basket Size=     605184 bytes  Compression= 174.40     *
*............................................................................*
*Br    9 :xSciFi    : x_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =   87312718 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.29     *
*............................................................................*
*Br   10 :ySciFi    : y_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =   79835379 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.50     *
*............................................................................*
*Br   11 :zSciFi    : z_SciFis[nHits]/F                                      *
*Entries :  3842054 : Total  Size=  199926290 bytes  File Size  =    6827258 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=  29.28     *
*............................................................................*
*Br   12 :txSciFi   : tx_SciFis[nHits]/F                                     *
*Entries :  3842054 : Total  Size=  199927606 bytes  File Size  =   87489485 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.28     *
*............................................................................*
*Br   13 :tySciFi   : ty_SciFis[nHits]/F                                     *
*Entries :  3842054 : Total  Size=  199927606 bytes  File Size  =   74680435 *
*Baskets :     1312 : Basket Size=   25600000 bytes  Compression=   2.68     *
*............................................................................*

I generated thosee tuples using the suggested approach in
https://root.cern.ch/how/how-write-ttree-python

Thanks for any advice.
Renato

ROOT Version: 6.18/04
Built for macosx64 on Sep 26 2019, 09:45:39
From tags/v6-18-04@v6-18-04

eguiraud · October 30, 2019, 12:59pm

Hi,
what’s the problem exactly (e.g. error message, expected vs obtained output)?

Cheers,
Enrico

RENATO_QUAGLIANI · November 1, 2019, 8:56am

This is what Python tells me @eguiraud

    result_ptrs[column] = _root.ROOT.Internal.RDF.RDataFrameTake(column_type)(df_rnode, column)
Exception: ROOT::RDF::RResultPtr<vector<ROOT::VecOps::RVec<float> > > ROOT::Internal::RDF::RDataFrameTake<ROOT::VecOps::RVec<float> >(ROOT::RDF::RInterface<ROOT::Detail::RDF::RNodeBase,void> df, basic_string_view<char,char_traits<char> > column) =>
    Unknown column: x_SciFis (C++ exception of type runtime_error)

My code :

df = RDF( "toyTracks", "reduced.root")
df_numpy = df.Range(100).AsNumpy( columns = ["nHits","x_SciFis"]  )

The tuple , in . attachment
reduced.root (13.3 KB)

RENATO_QUAGLIANI · November 4, 2019, 11:14am

@eguiraud do you need any extra input?

eguiraud · November 4, 2019, 12:01pm

Hi Renato,
I will be off for some time. Probably @swunsch or @Axel or @etejedor can help (maybe when CHEP is over).

Can you do anything with column x_SciFis? The error is “unknown column”, so it seems like RDF does not see/recognize that column/branch.

Cheers,
Enrico

RENATO_QUAGLIANI · November 4, 2019, 12:05pm

No, i cannot do anything with column x_SciFis except
TTree::Draw(“x_SciFis[0]”)… etc… but nothing with RDataFrame.

In the past . i saw that storing branches as vector<float> was working fine and RDataFrame can read it back, here i have float[12]. At ntuple production time I used float[nHits] where nHits also saved as branch, as . suggeested in https://root.cern.ch/how/how-write-ttree-python

eguiraud · November 4, 2019, 12:09pm

RDataFrame supports float[N], so there might be something fishy going on with this file (e.g. some edge case or combination of features that we do not handle correctly in RDataFrame).

Since you shared the offending file above, I think it’s on the ROOT team to investigate further.

Cheers,
Enrico

RENATO_QUAGLIANI · November 4, 2019, 9:13pm

I don’t know why i have this nasty . problem, but i made a workaround in tuple production :
I would recommend this meethod in the webpage when storing arrays. At the end the overall API is ecxactly the same , and all the AsNumpy is working fine.

	z_HITS = ROOT.std.vector('float')(12)

	ttree._z_HITS = z_HITS
	ttree.Branch("z_HITS",z_HITS) #, "z_HITS[12]/F)

	for i in range(100):
		z_HITS.clear()
		for idx, z in enumerate(z_SciFiModules) : 
			z_HITS.push_back( z)
		ttree.Fill()
	f.Write()

swunsch · November 5, 2019, 12:34am

How did you write them out before? Could you point us to the code?

Edit: Ok sorry, you already did Thanks!

swunsch · November 5, 2019, 2:55am

So, I think the problem comes up because you name the branch different than the leaf. Not the missing underscore for the leaflist tx_SciFis[nHits]/F compare to the branchname txSciFi:

*............................................................................*
*Br   12 :txSciFi   : tx_SciFis[nHits]/F                                     *
*Entries :      100 : Total  Size=       5877 bytes  File Size  =       1242 *
*Baskets :        1 : Basket Size=   25600000 bytes  Compression=   4.26     *
*............................................................................*

I’ll investigate further and make a proper bug report out of it.

Edit: Actually, the branch is there, however, due to the usage of the leaflist, the names differ. See following code.

import ROOT
df = ROOT.RDataFrame("toyTracks", "reduced.root")
print(df.GetColumnNames())
{ "eta", "phi", "pt", "txO", "tyO", "p", "qop", "mass", "nHits", "xSciFi.x_SciFis", "ySciFi.y_SciFis", "zSciFi.z_SciFis", "txSciFi.tx_SciFis", "tySciFi.ty_SciFis" }

Still, this is a bug that the different naming breaks your workflow because it segfaults if you put xSciFi.x_SciFis in AsNumpy.

swunsch · November 5, 2019, 3:21am

There we go: https://sft.its.cern.ch/jira/browse/ROOT-10397

Let’s move the discussion to JIRA and follow up the issue there.

Many thanks Renato for reporting!

RENATO_QUAGLIANI · November 5, 2019, 7:35am

Hi @swunsch, yes i had some typos in tuple production :

tree.Branch("x_SciFi", x_SciFis,"x_SciFis[nLay]")

Tough, i tried to correect for the names and i still don’t get AsNumpy working, this is why i moved to vectors.

system · November 19, 2019, 7:35am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.