Converting RDataFrames to awkward array

Hi,

I’m trying to convert RDataFrame to awkward array and i seem to get a error with ak.from_rdataframe. I realize this is not directly related to ROOT but i wanted to see if anyone has seen this issue before.
basically i have ROOT and Awkward downloaded via conda and when i run the following:

df3 = ROOT.RDataFrame("CollectionTree", "/global/u2/a/agarabag/pscratch/ditdau_samples/graviton.root")
npy3 = ak.from_rdataframe(df3, columns=("DiTauJetsAuxDyn.ditau_pt", "EventInfoAuxDyn.mcEventWeights"), keep_order=True)

i get the below error:

  File "/global/u2/a/agarabag/plotter_v5.py", line 534, in plot_branches
    npy3 = ak.from_rdataframe(df3, columns=("DiTauJetsAuxDyn.ditau_pt", "EventInfoAuxDyn.mcEventWeights"), keep_order=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/awkward/_dispatch.py", line 39, in dispatch
    gen_or_result = func(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/awkward/operations/ak_from_rdataframe.py", line 56, in from_rdataframe
    return _impl(rdf, columns, highlevel, behavior, with_name, offsets_type, keep_order)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/awkward/operations/ak_from_rdataframe.py", line 62, in _impl
    import awkward._connect.rdataframe.from_rdataframe  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/ROOT/_facade.py", line 154, in _importhook
    return _orig_ihook(name, *args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/awkward/_connect/rdataframe/from_rdataframe.py", line 51, in <module>
    cppyy.add_include_path(
  File "/global/homes/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/cppyy/__init__.py", line 221, in add_include_path
    raise OSError("no such directory: %s" % path)
OSError: no such directory: /global/u2/a/agarabag/.conda/envs/ditau/lib/python3.11/site-packages/awkward/_connect/header-only

This error occurred while calling

    ak.from_rdataframe(
        RDataFrame-instance
        columns = ('DiTauJetsAuxDyn.ditau_pt', 'EventInfoAuxDyn.mcEventWeights')
        keep

I wanted to know if anyone knows what could be causing this error? i have tried removing and re downloading awkward but that hasn’t helped. The “header-only” dir indeed does not exist but I’m not sure why. also awkward functions normally its only when calling ak.from_rdataframe that i get this error.

ROOT Version: 6.28.4
awkward version: 2.5.0

I guess @vpadulan can help.

Dear @agarabag ,

Thanks for reaching out on the forum! It seems the issue lies more on the awkward array side, but let’s try to understand it better. Can I ask you to downgrade awkward to a previous version, e.g. 2.4 or earlier? Let’s see if that is enough to fix the problem.

Cheers,
Vincenzo

Hi Vincenzo,

i tried version 2.4, and got same error then i tried 2.1.1 and it seems to work. i will let the awkward array people know about this maybe they can figure out where the issue comes from.
thank you for your help.

best,
Ali

1 Like

Also i wanted to ask this method with ak.from_rdataframe works fine for me with root files < 1GB larger than that i get “Segmentation fault”. I assume this is a memory issue. do you know if there are better ways to deal with trying to convert large root files? or just get a machine with large memory? (btw i need to convert to awkward since i will use this data in a python package for ml training)

ak.from_rdataframe works fine for me with root files < 1GB larger than that i get “Segmentation fault”

1 GB is extremely small and the segfault should never happen anyway. Can you send a full reproducer?

(btw i need to convert to awkward since i will use this data in a python package for ml training)

You could also try out the new, experimental feature that allows ingesting directly ROOT datasets into common ML training tools (e.g. PyTorch, Tensorflow). Look for the RBatchGenerator* tutorials in ROOT: TMVA tutorials . This requires the latest ROOT version 6.30.

Cheers,
Vincenzo

I have put the root files, python script, and conda env list here: (since I’m a new user its not allowing me to put the link in here so i have emailed you the link)

if you have time you can build the conda environment and run the python script on the root files you should be able to reproduce the segmentation fault (this is the only “error” i see).

also i contacted the awkward developers there was a bug so they will fix it.

1 Like

Dear @agarabag ,

After experimenting a bit, I noticed that while awkward==2.1.1 doesn’t present the original error you reported, it still seems to be connected to the segfault somehow. I installed awkward==2.2.0 and the segfault did not appear anymore on my machine. Could you try and confirm?

Cheers,
Vincenzo

Hi Vincenzo,

Thanks for investigating this issue. awkward==2.2.0 also works for me, I don’t get any of the errors. Although I tried running on all my root files and now I get the segfault for files with size > 4 GB. not really sure what’s going on. I have added the 4 GB file to the CernBox if you wanted to test.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.