Dear experts,
I have tested the new systematics variations RDataFrame::Vary in 6.26, I have a few questions/feedback:
I used the variations with the Filter option before Histo which I think is good because it could ignore if particular systematic has an unphysical value - it seems that this introduces some issues with the analysis graph:
The graph creates a filter for each variation (in my case the number of variations is going to be much larger)
Is it possible to name directly the histogram of each systematics like variable_variation or pass HistoModel somehow to Vary? Because currently all systematics have the same name/I rename them while saving them to file
What is the best way to save the histograms for the variations? Currently, Iām doing something like
std::vector<std::string> keys = vary_test.GetKeys();
for (auto& key: keys)
{
std::cout <<"Saving : "<<key<<std::endl;
//rename, otherwise the variations have the same name
std::string name=vary_test[key].GetName()+key;
vary_test[key].SetName(name.c_str());
vary_test[key].Write();
}
Something like:
for (auto& [key, value]: RResultMap) {
value.Write();
}
will need some kind of begin: error: invalid range expression of type 'ROOT::RDF::Experimental::RResultMap<TH1D>'; no viable 'begin' function available
Hello @zhubacek ,
thank you very much for the feedback!
this is a bug in the graph visualization, thank you for reporting! This is now tracked at [DF] SaveGraph output is wrong for varied Filters Ā· Issue #10666 Ā· root-project/root Ā· GitHub. (I guess the Filter expression depends on one or more varied quantities, right? In that case internally we have to replicate the filter for each āuniverseā, but thatās certainly not how we want to display things )
absolutely, thatās planned. If you have any suggestions on the naming scheme for histograms that depend on multiple variations I still am not sure what the best solution is
thatās how you would do it now, but @StephanH also complained about the lack of support for nice iteration in RResultMap (and I totally agree) and we should be adding it soon
I hope other than these annoyances things are working fine!
(Warning, very opinionated) Personally, Iād prefer that we be able to pass a HistoModel to Vary which contains a keyword (CMS Higgs Combine go-to: ā$SYSTEMATICā) that can be tagged for replacement by the variation name (minus the ā:ā ; so perhaps an optional argument in the VariationsFor command would suffice, and leave it as the userās responsibility to choose a keyword that wonāt cause problems). Then if we have such variations as [ājes:Upā, ājes:Downā, āpdf:0ā, āpdf:1ā, ā¦] we get a straightforward mapping to templates for Higgs Combine [ājesUpā, āpdf0ā, ā¦], which ~80%-ish of CMS analyses use.
Obviously there are more āefficientā ways to do this, but since itās a once-per-dataset kind of thing (another thing to considerā¦ naming histograms based on the dataset theyāre being produced for!), I think usability should trump hardline performance numbers, Afterall, a Userās main optimization target is their time, followed by the computerās (going strictly by monetary value, Iām afraid).
For the systematic naming, I also prefer if HistoModel could be passed to it.
Maybe if the name could configured with something like string format specifier?
It should be able to cover both $NAME"_"$SYSTEMATIC
which @nmangane suggested but also something like $NAME"_syst"$N
with some simple numbering when vector of strings of names are not specified
if varied histograms automatically got names like <name>_pt:up and <name>_syst:0 would there still be a strong need for customization? (Iām wary of adding too many knobs that complicate interfaces but then most users completely ignore. When possible, I prefer just having sane defaults.) Also how important is the removal of the :? Iād like to stay consistent in how we refer to variations.
Hi Enrico,
So something like - if vector of strings is provided then the first option is used, if not, variations are numbered sequentially?
But maybe I can live without the numbered sequence and create it as avector<string> myself - so something like <name>_<variationName>_<variationTag>
I personally donāt like using : but my case could be different from others - I for certain systematic have 100+ variations so they are numbered but others might prefer to have the systematics grouped in multiple Vary actions and prefer syst1:up, syst1:down, syst2:up syst2:down... (Iād still prefer syst1_up, syst1_down)
I think there are a lot of toolchains out there where assumptions are built in. If thereās only the default here, almost everyone will be renaming them again anyway, because I donāt see everyone changing all their downstream tools to conform to one convention. If thatās the case, then something that contains the variation name, tag, and some base name per-un-varied-histogram would be better than only the former two; at that point, whether ā:ā is the separator or not is just a matter of how hard it makes things to parse after-the-fact. Then maybe it makes sense to use it consistently for separating all the things that go into the histogram name for easy renaming later.
Independently of allowing customization through string interpolation (e.g. telling RDF your histogram name should be histo_$VARIATION) we need better defaults, so as a first step I think weāll give the varied histograms names of the form <name>_<variationName><variationTag>, removing the :.
I think weāll also remove the : from RResultMap too for consistency.