Data Structure Error while Converting CSV file to ROOT file

Here’s the code I found to convert csv file into root file

  1 {
  2         auto fileName = "output.csv";
  3         auto rdf = ROOT::RDF::MakeCsvDataFrame(fileName);
  4         rdf.Snapshot("myTree", "myFile.root");
  5 
  6 }

My csv file looks like this.


My root file came out to have this layout
image

The format I wanted was something like this:
Root file → 0 (tree named 0) → chn0, ch1, … (branches or leaves) which have the values shown in the csv file.

Please help me on how to approach this problem. Thank you.

Seems to me the ouput file you get looks like Comma Separated Value (csv). I am not sure to fully undertand what you are looking for. The documentation of MakeCsvDataFrame looks quite clear seems to me. May be @eguiraud can tell more.

Your file is not a CSV file. After the initial “signalData” line, the multiline string looks like JSON-format data.

The initial data was in json format and I used this code to change it into csv form.

After I got the csv file, I am trying to change that into file as I could not figure out how to change json to root directly.

@eguiraud So, maybe one just needs a conversation from a “pandas” object (loaded into RAM with “pandas.read_json”) to a “RDataFrame” object.

Can you please explain further? I do not understand. Thank you.

What if you do:

df['chn0'].to_csv(...) ?

or df[‘signalData’][‘chn0’], not sure…

If you have a pandas dataframe, create a dictionary where the keys are the names of the branches you’ll want in the root file, and the values are the numpy arrays extracted from the df column-by-column (with appropriate data cleaning and type conversion, if necessary).

Then, take this dictionary and pass it to this constructor rdf = ROOT.RDF.MakeNumpyDataFrame(DictOfBranches), with a following call rdf.Shapshot(treeName, fileName)

https://root.cern/doc/master/df032__MakeNumpyDataFrame_8py.html

1 Like

Yes that’s it for the pandas → RDF conversion, example code:

arr_dict = {c: np.array(pandas_df[c]) for c in pandas_df}
root_df = ROOT.RDF.MakeNumpyDataFrame(arr_dict)

Only “flat” pandas dataframes (one value per cell) are supported.

Cheers,
Enrico