beojan
October 20, 2018, 1:13pm
1
I have three filters which should split my data into three categories:
auto boosted = df.Filter("event.truth_type==2", "boosted");
auto intermediate = df.Filter("event.truth_type==1", "intermediate");
auto resolved = df.Filter("event.truth_type==0", "resolved");
However, the cutflow report reads:
boosted : pass=5454 all=89389 -- 6.101 %
intermediate: pass=36263 all=89389 -- 40.568 %
resolved : pass=54524 all=89389 -- 60.996 %
I.e. the three categories add up to more than 100%. Comparing to the result of TTree::Draw
, it seems the problem is with the resolved filter.
Wow that’s weird, thanks for reporting.
Could you provide a minimal reproducer that we can run and debug?
Cheers,
Enrico
beojan
October 21, 2018, 5:24pm
4
The macro is:
void ProcessTruth(std::string filename) {
using namespace std;
using namespace ROOT;
ofstream out("TruthSummary.csv", ios::app);
RDataFrame df("fullmassplane", filename);
auto boosted = df.Filter("event.truth_type==2", "boosted");
auto intermediate = df.Filter("event.truth_type==1", "intermediate");
auto resolved = df.Filter("event.truth_type==0", "resolved");
auto correct_selection = resolved.Filter("event.truth_h1_j1 && event.truth_h1_j2 && event.truth_h2_j1 && event.truth_h2_j2", "correct_sel");
auto correct_pair = resolved.Filter("event.truth_h1_j1==1 && event.truth_h1_j2==1 && event.truth_h2_j1==2 && event.truth_h2_j2==2", "correct_pair");
auto wrong_selection = resolved.Count().GetValue() - correct_selection.Count().GetValue();
auto wrong_pair = correct_selection.Count().GetValue() - correct_pair.Count().GetValue();
out << filename << "," << boosted.Count().GetValue() << "," << intermediate.Count().GetValue() << "," << resolved.Count().GetValue() << ","
<< correct_pair.Count().GetValue() << "," << wrong_pair << "," << wrong_selection << std::endl;
df.Report()->Print();
}
The file I’m running on is at /eos/user/b/bstanisl/DebugFile/M1200/M1200.root
and shared with @eguiraud and sft-root.
beojan
October 22, 2018, 9:51am
5
The correct_selection
and correct_pair
filters also return 0. Doing the same thing with root_pandas
shows that this is also incorrect.
EDIT:
Working interactively (root -l
) in LCG 94 Python 3:
root [0] using namespace ROOT;
root [1] RDataFrame df("fullmassplane", "M1200.root")
(ROOT::RDataFrame &) A data frame built on top of the fullmassplane dataset.
root [2] .ls
root [3] df.Filter("event.truth_type==0").Count()
(ROOT::RDF::RResultPtr<ULong64_t>) @0x58edf30
root [4] df.Filter("event.truth_type==0").Count().GetValue*(
root (cont'ed, cancel with .@) [5]
root (cont'ed, cancel with .@) [5].@
root [6] df.Filter("event.truth_type==0").Count().GetValue()
(const unsigned long long) 47672
root [7] df.Filter("event.truth_type==0").Count().GetValue()
(const unsigned long long) 54524
The incorrect value is only returned if another filter has been added already.
Thanks for the update.
This is now ROOT-9743 , let’s continue the discussion there.
I’m not sure I will have time to look into it before next week, but in any case it’s very close to the top of the priority queue
Cheers,
Enrico
system
Closed
November 5, 2018, 9:35pm
7
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.