Runtime error "object not convertible" when trying to insert a pandas DF into TTree via RDF

heico · July 21, 2021, 11:36am

Dear experts,

I have a huge load of data in a TTree, more than 1M channels with a number of floats and ints per entry. I want to “handle” the data in my code in the form of a pandas data frame.

Hence, I open the TFile and extract the TTree and read it into a RDataFrame, which I then export via numpy to a pandas data frame. After the heavy work is done, a new pandas data frame (of similar number of rows but different number of columns) should be written to a TTree, not necessarily a TFile (in fact in some use cases I want to store the TTree in a TMemFile to serialize it, c.f. Serialize and deserialize TTree into coral::Blob). I’m saying this because, apparently, RDataFrame::Snapshot exports it to a TTree within a TFile – is there a way without the TFile?

I thought the most straight forward thing to do is the following:

out = ROOT.RDF.MakeNumpyDataFrame(df.to_dict("list"))
out.Snapshot("tree", "myTestFile.root")

where df obviously is the pandas data frame that, via to_dict gets first exported into a dictionary a la {"col1": [....lots of values for this column for each row....], "col2": .... etc.}. This apparently is the required input for MakeNumpyDataFrame.

However, ROOT complains about it. The error reads as follows:

Traceback (most recent call last):
  File "test3.py", line 75, in <module>
    out = ROOT.RDF.MakeNumpyDataFrame(foo)
RuntimeError: Object not convertible: Dictionary entry sector is not convertible with AsRVec.

sector is one of the column names, and its column contains integers between 0 and 15 inclusive. The data frame has 1’048’586 rows. I’m using Python version 3.8.6 with ROOT 6.24/00 and GCC 8.3.0 on linux/lxplus7.

Am I doing something wrong or is there just too much data for MakeNumpyDataFrame to handle?
Is there a better or more efficient way of building a TTree out of a pandas data frame (preferrably without automatically exporting it to a TFile since, as I mentioned already, I want to serialize the TTree).

Thank you for any helpful advice!
heico

eguiraud · July 21, 2021, 12:03pm

Hi @heico ,

Definitely not You should get the same error if you pass only a few rows to MakeNumpyDataFrame: the error message hints at a type mismatch of some sort.

Could you please provide a minimal reproducer? There is probably one column that is of a type that’s not handled correctly.

Cheers,
Enrico

heico · July 21, 2021, 1:05pm

Hi @eguiraud,

Definitely not

hmyeah, figured that, it’s still ROOT we’re talking about

Your last sentence gave me the right clue to look and eventually find the error. The point is, df.to_dict("list") exports the data frame to a dictionary in the following format

{"col1": [val1, val2, val3, ..], "col2": ...}

meaning, the values are actually python lists not numpy arrays. Feeding that into MakeNumpyDataFrame does not work. So I now added a loop through the dict after the above line replacing the list by a numpy array that is initialized with the list but which also forces specific numpy data types, namely int32 for ROOT "I" branches, and float32 for ROOT "F" branches. Now it works perfectly, though it takes a few seconds but that’s not a problem.

To the other question: is it possbile to do the export of the RDataFrame to a TTree without storing it in a TFile?

Thanks a lot!
heico

eguiraud · July 21, 2021, 1:09pm

I am afraid we do not have an equivalent of Snapshot for that, you would have to do it through a Foreach or a custom action that you register via Book.

Cheers,
Enrico

heico · July 21, 2021, 1:41pm

OK no problem. I can write a temporary TFile and re-extract the TTree from there, I think that’s the easiest given that my code uses a temporary dir anyway.

Thank you for the help!

Regards,
heico

eguiraud · July 21, 2021, 1:42pm

Alright. If the temporary dir lives on a tmpfs filesystem, that’s basically equivalent to writing to a TMemFile: you are writing to RAM either way.

pcanal · July 21, 2021, 7:06pm

Could there be a (new) overload of Snapshot that takes a TFile* rather than a string (and then the TFile can be a TMemFile)?

system · August 4, 2021, 7:06pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.