Problem with Dataframe.Snapshot

I’m trying to drop some columns from a root file by doing a snapshot and selecting only some of the columns in the following way, but

temp = ROOT.ROOT.RDataFrame("EventTree",path)
good_cols = ROOT.std.vector('string')()
for itm in filter(lambda x: x in retain, temp.GetColumnNames()):
    good_cols.push_back(itm)

df =  ROOT.ROOT.RDataFrame("EventTree",path)
df.Snapshot("thinned_tree", "out.root",good_cols)

which results in

TypeError: none of the 3 overloaded methods succeeded. Full details:
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(experimental::basic_string_view<char,char_traits<char> > treename, experimental::basic_string_view<char,char_traits<char> > filename, const vector<string>& columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    Cannot jit Snapshot call. Interpreter error code is 1. (C++ exception of type runtime_error)
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(experimental::basic_string_view<char,char_traits<char> > treename, experimental::basic_string_view<char,char_traits<char> > filename, experimental::basic_string_view<char,char_traits<char> > columnNameRegexp = "", const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    could not convert argument 3
  ROOT::RDF::RResultPtr<ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void> > ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager,void>::Snapshot(experimental::basic_string_view<char,char_traits<char> > treename, experimental::basic_string_view<char,char_traits<char> > filename, initializer_list<string> columnList, const ROOT::RDF::RSnapshotOptions& options = ROOT::RDF::RSnapshotOptions()) =>
    could not convert argument 3

which I can’t make much sense of :frowning:

Hi,
what’s the output of type(good_cols) and print(good_cols)?

The error seems to be that argument 3 to Snapshot (good_cols) can’t be converted to any of the types expected by the Snapshot overloads.

EDIT: also, what version of ROOT are you on? It seems you are missing a few overloads, e.g. the ones that take a std::vector<std::string>.

Cheers,
Enrico

Version: JupyROOT 6.14/00

[1] type(good_cols)
ROOT.vector<string>
[2] print(good_cols)
<ROOT.vector<string> object at 0x55695df090c0>

Uhm…can you try the same in the ROOT (C++) prompt – which might provide a better error message – and also post the contents of good_cols?

The following worked in a root terminal and is in principle the same thing as what the python code above should do:

root [1] auto df = ROOT::RDataFrame("EventTree","output.root")
root [3] vector<string> goods
root [4] goods.push_back("Header.event_no")
root [5] goods.push_back("Det0.count_water")
root [7] goods.push_back("Det10.count_water")
root [8] df.Snapshot("thinned_tree", "out.root",goods)

The contents of the python list, good_cols defined as above:

 In[1]: list(good_cols)
Out[1]: ['Header.event_no',
 'Header.primaryID',
 'Header.energy',
 'Header.zFirstInteract',
 'Header.zenith',
 'Header.azimuth',
 'Header.coreX',
 'Header.coreY',
 'Det0.count_water',
 'Det1.count_water',
 'Det2.count_water',
 'Det3.count_water']

When I did the cpp version above I got an error trying to push something with ’ around it onto the string-vector, could there be some problem due to how python represents strings?

Everything looks fine here, and the C++ version works correctly.
I’ll need to look into this a bit more closely: could you provide a very small file (even 1 event is fine, if the problem presents itself with 1 event) that I can use to reproduce the issue?

Cheers,
Enrico

Actually, I’ve restarted the notebook and now I no longer get the original problem: it just seems to hang forever. The cpp example took a few minutes at most, but the notebook is now stuck on the snapshot line at least half an hour, and I have probably let it run for two hours at least once.

I’m not sure what I’ve changed, maybe there was some old variable shadowing good_columns before the restart.

Ok, good I guess, unless the hanging you see now is a different manifestation of the same issue :smile:

You can try not to call ROOT::EnableImplicitMT to disable multi-threading and to prepend a Range(10) to the Snapshot to run on very few events, and to snapshot only one branch. If that hangs for half an hour, we have a problem – one that we can only debug with a reproducer in hand.

Cheers,
Enrico

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.