Howdy y’all,
Me again. Same issue as last time (trying to read in lots of analysis files), but now with a fun new quirk.
- I have about 100 files and about 200GB of data.
- This can be split into different categories: files->[data, MC].
- These can be spit into different components: MC->[signal, background1, background2] etc.
- These can be split into different contributions: signal->[[electron events, muon events],[2016, 2017]]
- These contributions are then split across multiple datasets: muon16->[id10001,id10002,id10003] etc
- These datasets each have several hundred (systematic) trees containing identically named branches.
Ok so what I did is was make a class for each component (from 3.). Each class has a dictionary entry corresponding to each tree containing an RDataFrame with the tree name loaded from a list of datasets.
c = my_component(name="signal component")
cfiles = ROOT.std.vector()
[cfiles.push_back(f) for f in ["one.root","two.root","three.root"]]
for systematic in ["nominal","up1","down1","up2","down2"]:
c.syst[systematic] = ROOT.RDataFrame(systematic, cfiles)
Two issues:
a. different components (from 4.) have different branches eg. electrons, muons etc. - normally handled with a if n_electron>0; electron[0].pt()>30
else if n_muon>0:muon[0].pt()>15
type affair.
b. different datasets need scale factors corresponding to their id. eg 1.0 for every event in id10001 but .8 for every event in id10002.
so my question is this:
I’d like to create a single dataframe from multiple files whilst still remembering which file they came from as they might require different selections.
Related, if I try to plot eg the leading muon pt in an event with no muons I get a segfault (which is very difficult to pin down). Is there not an ‘ignore null pointer’ option for the plotting?
an idea that has been suggested before is running this multiple times to create more slimmed down trees that then can be hadded… These files are already slimmed down twice so I’m looking for ways to avoid doing this another 2 times. In pandas or R I’d just make multiple DataFrames and concatenate them but I know this isn’t possible in root.
All the best,
~/V
Please read tips for efficient and successful posting and posting code
ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided