RDataFrame systematics variation - Vary

Dear experts,
I have tested the new systematics variations RDataFrame::Vary in 6.26, I have a few questions/feedback:

  1. I used the variations with the Filter option before Histo which I think is good because it could ignore if particular systematic has an unphysical value - it seems that this introduces some issues with the analysis graph:


    The graph creates a filter for each variation (in my case the number of variations is going to be much larger)

  2. Is it possible to name directly the histogram of each systematics like variable_variation or pass HistoModel somehow to Vary? Because currently all systematics have the same name/I rename them while saving them to file

  3. What is the best way to save the histograms for the variations? Currently, Iā€™m doing something like

std::vector<std::string> keys = vary_test.GetKeys();
  
  for (auto& key: keys)
    {
      std::cout <<"Saving : "<<key<<std::endl;
      //rename, otherwise the variations have the same name
      std::string name=vary_test[key].GetName()+key;
      vary_test[key].SetName(name.c_str());
      vary_test[key].Write();
    }

Something like:

for (auto& [key, value]: RResultMap) {
value.Write();
}

will need some kind of begin:
error: invalid range expression of type 'ROOT::RDF::Experimental::RResultMap<TH1D>'; no viable 'begin' function available

Thanks,
Cheers,
Zdenek

ROOT Version: 6.26/02
Platform: macOS 12.3
Compiler: clang 13.1.6


Hello @zhubacek ,
thank you very much for the feedback!

  1. this is a bug in the graph visualization, thank you for reporting! This is now tracked at [DF] SaveGraph output is wrong for varied Filters Ā· Issue #10666 Ā· root-project/root Ā· GitHub. (I guess the Filter expression depends on one or more varied quantities, right? In that case internally we have to replicate the filter for each ā€œuniverseā€, but thatā€™s certainly not how we want to display things :slight_smile: )

  2. absolutely, thatā€™s planned. If you have any suggestions on the naming scheme for histograms that depend on multiple variations I still am not sure what the best solution is :slight_smile:

  3. thatā€™s how you would do it now, but @StephanH also complained about the lack of support for nice iteration in RResultMap (and I totally agree) and we should be adding it soon

I hope other than these annoyances things are working fine!

Cheers,
Enrico

1 Like

(Warning, very opinionated) Personally, Iā€™d prefer that we be able to pass a HistoModel to Vary which contains a keyword (CMS Higgs Combine go-to: ā€œ$SYSTEMATICā€) that can be tagged for replacement by the variation name (minus the ā€˜:ā€™ ; so perhaps an optional argument in the VariationsFor command would suffice, and leave it as the userā€™s responsibility to choose a keyword that wonā€™t cause problems). Then if we have such variations as [ā€˜jes:Upā€™, ā€˜jes:Downā€™, ā€˜pdf:0ā€™, ā€˜pdf:1ā€™, ā€¦] we get a straightforward mapping to templates for Higgs Combine [ā€˜jesUpā€™, ā€˜pdf0ā€™, ā€¦], which ~80%-ish of CMS analyses use.

Obviously there are more ā€˜efficientā€™ ways to do this, but since itā€™s a once-per-dataset kind of thing (another thing to considerā€¦ naming histograms based on the dataset theyā€™re being produced for!), I think usability should trump hardline performance numbers, Afterall, a Userā€™s main optimization target is their time, followed by the computerā€™s (going strictly by monetary value, Iā€™m afraid).

Thanks a lot for the replies!

For the systematic naming, I also prefer if HistoModel could be passed to it.

Maybe if the name could configured with something like string format specifier?
It should be able to cover both
$NAME"_"$SYSTEMATIC
which @nmangane suggested but also something like
$NAME"_syst"$N
with some simple numbering when vector of strings of names are not specified

Cheers,
Zdenek

Hi Nick, Zdenek,

if varied histograms automatically got names like <name>_pt:up and <name>_syst:0 would there still be a strong need for customization? (Iā€™m wary of adding too many knobs that complicate interfaces but then most users completely ignore. When possible, I prefer just having sane defaults.) Also how important is the removal of the :? Iā€™d like to stay consistent in how we refer to variations.

Cheers,
Enrico

Hi Enrico,
So something like - if vector of strings is provided then the first option is used, if not, variations are numbered sequentially?
But maybe I can live without the numbered sequence and create it as avector<string> myself - so something like <name>_<variationName>_<variationTag>
I personally donā€™t like using : but my case could be different from others - I for certain systematic have 100+ variations so they are numbered but others might prefer to have the systematics grouped in multiple Vary actions and prefer syst1:up, syst1:down, syst2:up syst2:down... (Iā€™d still prefer syst1_up, syst1_down)

Cheers,
Zdenek

I think there are a lot of toolchains out there where assumptions are built in. If thereā€™s only the default here, almost everyone will be renaming them again anyway, because I donā€™t see everyone changing all their downstream tools to conform to one convention. If thatā€™s the case, then something that contains the variation name, tag, and some base name per-un-varied-histogram would be better than only the former two; at that point, whether ā€˜:ā€™ is the separator or not is just a matter of how hard it makes things to parse after-the-fact. Then maybe it makes sense to use it consistently for separating all the things that go into the histogram name for easy renaming later.

Thank you for feedback!

Independently of allowing customization through string interpolation (e.g. telling RDF your histogram name should be histo_$VARIATION) we need better defaults, so as a first step I think weā€™ll give the varied histograms names of the form <name>_<variationName><variationTag>, removing the :.

I think weā€™ll also remove the : from RResultMap too for consistency.

Cheers,
Enrico

P.S. point 3., iteration over RResultMap, is implemented at [DF] Add iteration support to RResultMap by eguiraud Ā· Pull Request #10700 Ā· root-project/root Ā· GitHub and will land in master today/early next week.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.