How can we mock data?

Hello,

Sometimes we need to patch our data. For example, we have a simulated sample for 2016 but not the one for 2017. If we want to do a quick study and 2017 simulation is good enough, we would like to do something like:

df = df.mock(branch='year', original = '2016', new = '2017')

_process_data(df=df)

which would use all the entries where the data is for 2016 as if it were for 2017. This would “fake” extending/patching the dataset. Is this possible? I see that the only we we can do this is:

df_17 = df.Filter('year == 2016')
df_17 = df_17.Redefine('year', '2017')
df_17.Snapshot('2017.root', 'tree')

df = RDataFrame(['2017.root', 'original_file.root'], 'tree')

Or in other words, duplicating this 2016 dataset by actually making a ROOT file. But this is very inneficient, specially if the dataframe has thousands of columns.

Please read tips for efficient and successful posting and posting code

Please fill also the fields below. Note that root -b -q will tell you this info, and starting from 6.28/06 upwards, you can call .forum bug from the ROOT prompt to pre-populate a topic.

ROOT Version: 6.32
Platform: linux
Compiler: gnu


I’m not sure to understand what you’re asking for, but maybe @vpadulan can take a look

For instance in a dataframe with 1000 entries, we have a column called year, and there are 100 entries for which year == 2016. I need those entries to be duplicated and the year to be made 2017. Then I need the extra entries to be appended such that I get a dataframe with 1100 entries.

But this is not really needed, because the fake 2017 data is already in the dataframe, so you probably need some sort of mocking of this 2016 entries. The way I can do this is to actually filter the 2016 entries, rename the year column and save to disk. Then load again and append it to the rest of the dataset, but that is very inneficient.

OK, then let see if @vpadulan has a better solution for this