Distributed RDataFrame fromSpec()

qidong · February 10, 2024, 5:52pm

Dear ROOT experts,

I have a question regarding creating distributed RDFs with the Spark backend.

I’ve been exploring the capabilities of ROOT, particularly in handling RDFs, and I’m intrigued by the potential of leveraging the Spark backend for distributed computing. I’ve noticed that it’s possible to create RDFs from samples and metadata using RDF::Experimental::FromSpec() method. However, I’m wondering if there’s a similar functionality available when working with the Spark backend.

Specifically, I’d like to know if there’s a way to create distributed RDFs with Spark backend while incorporating samples’ file names and metadata from a JSON file. This would greatly streamline my workflow and enable me to efficiently analyze large datasets.

Any insights, guidance, or examples on how to achieve this would be greatly appreciated. Apologies if this question has been addressed previously; I’ve tried searching the forums but couldn’t find a definitive answer.

Thank you in advance for your assistance!

Best regards,
Dong

ROOT Version: 6.30.2
Platform: Almalinux 9
Compiler: GCC12

couet · February 11, 2024, 9:03am

I guess @vpadulan can help

system · February 25, 2024, 9:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

vpadulan · March 1, 2024, 11:30am

Dear @qidong

Sorry for the late reply. This feature was recently introduced by Add support for distributed RDataFrame constructed from RDatasetSpec by gwmyers · Pull Request #14802 · root-project/root · GitHub and more tests will be added soon also including metadata handling. I expect you will be able to use it in upcoming ROOT release 6.32.

I would be very interested in getting your own opinion on the kind of specification file (i.e. json) you have in mind, how you would like to specify metadata etc. There’s an open issue with some discussion at A standard schema for semi-structured dataset specification formats · Issue #11624 · root-project/root · GitHub . More examples from real users are very appreciated so we can make sure to build support for them. Feel free to also message me privately to continue the discussion in more details.

Cheers,
Vincenzo