CSV to RDataFrame

Hello,

I tried to read in a csv file using: auto tdf = ROOT::RDF::MakeCsvDataFrame("data.csv"); but it did not work. I looked around and it seems that RDataFrame cannot exponential formated data from a csv file, i.e. 1.3e6, 1.4e4,1e4 is read in as three strings instead of three doubles.

Is that right and is someone working on this issue?

Thanks a lot!


ROOT Version: 6.14.04
Platform: Linux
Compiler: gcc


Hi,

this must be a glitch in the type inference mechanism. We will fix it. Can you share the input file to be sure your usecase is addressed?

Thanks for reporting.

Cheers,
D

The first lines of my inputfile are:

percentage,press_uncalib,tracking_calib,dumb_calib
0.0,2.11e-8,8.59e-7,1.3e-5
1.0,3.63e-7,1.96e-5,0.9e.5
5.0,1.01e-6,5.62e-5,1.0e-5
10.0,1.82e-6,1.06e-4,1.2e-5

Hi,

while we implement and backport the fix, you can convert the csv to an equivalent one which will be processed by the csv source.
The python snippet to do this is the following:

import sys

lines = open(sys.argv[1]).readlines()
print lines[0],
for line in lines[1:]:
    sci_numbers = line[:-1].split(',')
    numbers = map(lambda x : "%.8f" %x, map(float, sci_numbers))
    print ",".join(numbers)

the invocation is

python convert.py myoriginal.csv >& converted.csv

The csv you pasted in the previous post contains perhaps a typo, i.e. 0.9e.5: could it be 0.9e-5?

I hope this unblocks you.

Cheers,
D

1 Like

Hi @Danilo,

Just to add on the complete answer of Danilo, I confirm that we currently do not support exponent syntax for the double type. For the record these are the current regexes:

"^[-+]?[0-9]+\\.[0-9]*$"
"^[-+]?[0-9]*\\.[0-9]+$"

Enric

Thank you very much for your python script!
And indeed there was a typo in this specific file header I choose as example for your.

Thanks a lot for improving the root experience!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.