Dear experts,
Thanks to some suggestions/discussions from @siliataider , i managed to make workable in a code i am using in pyROOT the RNTuple from ROOT 6.36.
Often in python i was obliged to fill a dictionary of lists, to then create a pandas and then import in RDataFrame or dump stuff for each processed event and then reload and concatenate. It works fine for 10-20 events, but for 1000 or more i hit some wall in run time. I have not benchmarked if RNTuple is faster to flush-write while processing, but i guess so. Therefore i tried to write the same using RNTuple and it seems to work fine.
Here an example code i ended up with which almost emulate what i do with a more complex filling setup and with more branches of course to fill. I tought that it would be useful to have as part of the tutorials something like this also in pyROOT , if so let me know where it should be added/pulled. My next step would be to have the writer working with python - multiprocessing and i was wondering if you have any suggestions for it to work.
Thanks in advance,
Renato
import pandas as pd
import numpy as np
import ROOT
import math
RNTupleModel = ROOT.RNTupleModel
RNTupleReader = ROOT.RNTupleReader
RNTupleWriter = ROOT.RNTupleWriter
class DummyStudy:
def __init__(self, ofile='test.root'):
self.model = RNTupleModel.Create()
columns_add= {
'int' : [ 'int_dummy' , 'int_dummy2'],
'double' : [ 'double_dummy', 'double_dummy2'],
'bool' : [ 'bool_dummy' , 'bool_dummy2'],
'std::vector<double>' : [ 'vector_double_dummy', 'vector_double_dummy2']
}
for vartype, list_names in columns_add.items() :
for name in list_names:
print(f"adding type = {vartype} with name = {name}")
self.model.MakeField[vartype](f"{name}")
self.writer = RNTupleWriter.Recreate( self.model, "test_tree", ofile)
self.entry = self.writer.CreateEntry()
def fill(self, entry ) :
# some entry not filled, but still flushed with default zeroes or zero-sized vector (no warning ! )
self.entry["int_dummy"] = 42
self.entry["double_dummy"] = 42.
self.entry["bool_dummy"] = False
self.entry["vector_double_dummy"] = [ 9.,10.,20.]
self.writer.Fill(self.entry)
dummyStudy = DummyStudy( ofile='test_fill_uniquecreate.root')
for i in range(100000):
dummyStudy.fill(i)
del dummyStudy
# reload rdataframe from filled RNTuple
df = ROOT.RDataFrame("test_tree", "test_fill_uniquecreate.root")
hist = df.Histo1D("vector_double_dummy")
c = ROOT.TCanvas()
hist.Draw()
c.Draw()