Dear all,
while playing around with RDataFrame
I noticed a strange behavior in conjunction with “preselected chains”:
Let’s say I have different TFile
s with different TTree
s in it and I want to apply some coarse cuts that depend on the actual file/tree pair.
I thought of attaching a combined TEntrylist
to the TChain
before wrapping the chain itself in an RDataFrame
and proceeding with the analysis. However the dataframe seems not to honor the preselection encoded in the entrylists.
To demonstrate this, I borrowed from @eguiraud’s test that covers a similiar situation:
But when I alter the way the eventlists are generated, only e
values from the first file seem to be selected (note the offset parameter):
void MakeInputFile(const std::string &filename, int nEntries, int offset=0)
{
const auto treename = "t";
auto d = ROOT::RDataFrame(nEntries)
.Define("e", [&offset](ULong64_t e) { return int(e+offset); }, {"rdfentry_"})
.Snapshot<int>(treename, filename, {"e"});
}
void TestChainWithEntryList()
{
const auto nEntries = 10;
const auto treename = "t";
const auto file1 = "rdfentrylist1.root";
MakeInputFile(file1, nEntries);
const auto file2 = "rdfentrylist2.root";
MakeInputFile(file2, nEntries, 100);
/* Preselect events by classic TTree::Draw method */
auto f1 = TFile::Open(file1);
auto t1 = f1->Get<TTree>(treename);
gROOT->cd();
t1->Draw(">>elist1", "e%2==0", "entrylist");
f1->Close();
auto f2 = TFile::Open(file2);
auto t2 = f2->Get<TTree>(treename);
gROOT->cd();
t2->Draw(">>elist2", "e%2==0", "entrylist");
f2->Close();
// make a TEntryList that contains two TEntryLists in its list of TEntryLists,
// as required by TChain (see TEntryList's doc)
TEntryList elists;
elists.Add(gROOT->Get<TEntryList>("elist1"));
elists.Add(gROOT->Get<TEntryList>("elist2"));
TChain c(treename);
c.Add(file1, nEntries);
c.Add(file2, nEntries);
c.SetEntryList(&elists);
auto entries = ROOT::RDataFrame(c).Take<int>("e");
/* List all the entries gathered by RDataFrame... */
for (const auto& e : *entries) {
std::cout << e << " ";
}
std::cout << std::endl;
/* On the contrary TChain::Scan can do it*/
c.Scan("e");
gSystem->Unlink(file1);
gSystem->Unlink(file2);
}
void test() {
std::cout << gROOT->GetVersion() << std::endl;
TestChainWithEntryList();
}
My macro produces the following output, with different results for RDataFrame
and TChain::Scan
:
6.20/04
0 2 4 6 8 0 2 4 6 8
************************
* Row * e *
************************
* 0 * 0 *
* 2 * 2 *
* 4 * 4 *
* 6 * 6 *
* 8 * 8 *
* 10 * 100 *
* 12 * 102 *
* 14 * 104 *
* 16 * 106 *
* 18 * 108 *
************************
I am not sure, weather my handling of the eventlists is correct, but as the scan method delivers the expected results I would rather think of a bug in the RDataFrame
/TTreeReader
framework.
In general I am not very satisfied with the way the eventlists are produced in my macro. (RDataFrame
is far better than ancient TTree::Draw
). But I did not find any other way how to apply Filters
solely based on file(-name) storing a particular tree. Or is it somehow possible to chain dataframes (with initial filers already applied) instead of trees? Or did I miss something and there exists a far better approach?
Many thanks in advance,
Philipp