I was curious if it was possible to extract a (finally text or python set) list of all the final and intermediate (Define’d rather than more transient ones) variables from RDataFrame, for the to-be-executed graph. I have a rather large computation graph with numerous variations depending on what inputs (think iterative stages, not just samples) final histograms, categories, systematic variations are run, which would make extracting them from the code more painful than it would be worth. I don’t recall if SaveGraph already ignores the pruned columns, if it did then a regex search through the .dot file might work, but if there were a more immediate way to get this I’d prefer that.
ROOT Version: 6.24+
not only we don’t have an API for this, but we don’t even have internal logic that builds that list explicitly: basically the columns that are required for some operation are used, the others are ignored, so there is no explicit pruning of unused Defines going on.
SaveGraph also shows the “pre-pruning” graph, in that sense.
Note that in general that list is not even well defined, if it is a flat list: you can Define column “x” in two different branches of the computation graph and then actually use it in one, but not in the other. So “x” as a column name would be both used and unused.
So tough question, I am not sure how to make this well-defined – what do you think? What do you need this list for?
Definitely don’t file this under feature requests! I was going to use it to do a bit of analysis on what actually gets used across different stages/categorization schemes in my analysis, and potentially also do some optimization via automatic minimal intermediate root files. The combinatorics are annoying enough I was hoping for a very immediate shortcut!
Thanks for the quick answer!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.