Data Structure Error while Converting CSV file to ROOT file

TheScientist · July 28, 2022, 12:21pm

Here’s the code I found to convert csv file into root file

  1 {
  2         auto fileName = "output.csv";
  3         auto rdf = ROOT::RDF::MakeCsvDataFrame(fileName);
  4         rdf.Snapshot("myTree", "myFile.root");
  5 
  6 }

My csv file looks like this.

My root file came out to have this layout

The format I wanted was something like this:
Root file → 0 (tree named 0) → chn0, ch1, … (branches or leaves) which have the values shown in the csv file.

Please help me on how to approach this problem. Thank you.

couet · July 28, 2022, 12:33pm

Seems to me the ouput file you get looks like Comma Separated Value (csv). I am not sure to fully undertand what you are looking for. The documentation of MakeCsvDataFrame looks quite clear seems to me. May be @eguiraud can tell more.

Wile_E_Coyote · July 28, 2022, 12:37pm

Your file is not a CSV file. After the initial “signalData” line, the multiline string looks like JSON-format data.

TheScientist · July 28, 2022, 1:39pm

The initial data was in json format and I used this code to change it into csv form.

After I got the csv file, I am trying to change that into file as I could not figure out how to change json to root directly.

Wile_E_Coyote · July 28, 2022, 2:05pm

@eguiraud So, maybe one just needs a conversation from a “pandas” object (loaded into RAM with “pandas.read_json”) to a “RDataFrame” object.

TheScientist · July 28, 2022, 2:10pm

Can you please explain further? I do not understand. Thank you.

ferhue · July 29, 2022, 8:53am

What if you do:

df['chn0'].to_csv(...) ?

or df[‘signalData’][‘chn0’], not sure…

nmangane · July 29, 2022, 10:53am

If you have a pandas dataframe, create a dictionary where the keys are the names of the branches you’ll want in the root file, and the values are the numpy arrays extracted from the df column-by-column (with appropriate data cleaning and type conversion, if necessary).

Then, take this dictionary and pass it to this constructor rdf = ROOT.RDF.MakeNumpyDataFrame(DictOfBranches), with a following call rdf.Shapshot(treeName, fileName)

https://root.cern/doc/master/df032__MakeNumpyDataFrame_8py.html

eguiraud · August 3, 2022, 4:44pm

Yes that’s it for the pandas → RDF conversion, example code:

arr_dict = {c: np.array(pandas_df[c]) for c in pandas_df}
root_df = ROOT.RDF.MakeNumpyDataFrame(arr_dict)

Only “flat” pandas dataframes (one value per cell) are supported.

Cheers,
Enrico