CSV to RDataFrame

b_h · September 6, 2018, 7:38am

Hello,

I tried to read in a csv file using: auto tdf = ROOT::RDF::MakeCsvDataFrame("data.csv"); but it did not work. I looked around and it seems that RDataFrame cannot exponential formated data from a csv file, i.e. 1.3e6, 1.4e4,1e4 is read in as three strings instead of three doubles.

Is that right and is someone working on this issue?

Thanks a lot!

ROOT Version: 6.14.04
Platform: Linux
Compiler: gcc

Danilo · September 6, 2018, 8:19am

Hi,

this must be a glitch in the type inference mechanism. We will fix it. Can you share the input file to be sure your usecase is addressed?

Thanks for reporting.

Cheers,
D

b_h · September 6, 2018, 10:25am

The first lines of my inputfile are:

percentage,press_uncalib,tracking_calib,dumb_calib
0.0,2.11e-8,8.59e-7,1.3e-5
1.0,3.63e-7,1.96e-5,0.9e.5
5.0,1.01e-6,5.62e-5,1.0e-5
10.0,1.82e-6,1.06e-4,1.2e-5

Danilo · September 6, 2018, 7:16pm

Hi,

while we implement and backport the fix, you can convert the csv to an equivalent one which will be processed by the csv source.
The python snippet to do this is the following:

import sys

lines = open(sys.argv[1]).readlines()
print lines[0],
for line in lines[1:]:
    sci_numbers = line[:-1].split(',')
    numbers = map(lambda x : "%.8f" %x, map(float, sci_numbers))
    print ",".join(numbers)

the invocation is

python convert.py myoriginal.csv >& converted.csv

The csv you pasted in the previous post contains perhaps a typo, i.e. 0.9e.5: could it be 0.9e-5?

I hope this unblocks you.

Cheers,
D

etejedor · September 7, 2018, 7:26am

Hi @Danilo,

Just to add on the complete answer of Danilo, I confirm that we currently do not support exponent syntax for the double type. For the record these are the current regexes:

"^[-+]?[0-9]+\\.[0-9]*$"
"^[-+]?[0-9]*\\.[0-9]+$"

Enric

b_h · September 7, 2018, 12:20pm

Thank you very much for your python script!
And indeed there was a typo in this specific file header I choose as example for your.

Thanks a lot for improving the root experience!

system · September 21, 2018, 12:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.