I am trying to open a file which contains one TTree, and then produce slices of this tree in one certain variable (in this case eta) and then save the sliced trees in a new output file. For this, I wrote the following function:
This works fine, but I noticed that as soon as I have a large number of events this gets terribly slow. In the end I need to produce 100 slices for 1 000 000 events (i.e. each slice will have roughly 10 000 events). Is there a way to make this more efficient?
Hi,
if I understand correctly, you would run over the input tree once per slice.
A better approach would be to only run over the input tree once and produce your 100 slices in one go.
You can easily do it with RDataFrame but need to write each slice to a different file (have not tested the code, but it should give you an idea):
ROOT::RDataFrame df("tree", "input.root");
ROOT::RDF::RSnapshotOptions opts;
opts.fLazy = true;
for (float eta = 0.f; eta < 5.f; eta += 0.05f) {
const std::string filter = Form("(eta>-%.2f && eta<-%.2f) || (eta>%.2f && eta<%.2f)", eta+0.05, eta, eta, eta+0.05);
const std::string out_file = "output_" + std::to_string(eta) + ".root";
df.Filter(filter).Snapshot("slice", out_file, {}, opts);
}
// this is to actually trigger the event loop, since all Snapshots were marked "lazy"
df.Count().GetValue();
If you add ROOT::EnableImplcitMT at the beginning, RDF actually runs the procedure in parallel on multiple threads.
thanks for your answer. I am currently facing some issues in getting RDataFrame into our CMake setup but this definitely looks promising. So I guess to get a single file the best way would be to hadd them at the end, right?
You could do something like that, yes. Here is a little project of mine that depends on RDF and has CMakeLists that, at least 2 years ago, worked fine. Maybe it can help. If not, feel free to create a little reproducer of your CMake issue and open a new thread on the forum.
**error:** **static_assert failed "filter expression returns a type that is not convertible to bool"**
static_assert(std::is_convertible<FilterRet_t, bool>::value,
How can I translate you code example to work with the vectors? If I do variable[0]> then it doesn’t complain anymore but also doesn’t seem to produce the output file.
Hi @mark1,
ok if you are Filtering on vector branches this explains the error:
in RDataFrame, all arrays and vectors can be read in as RVec<T>. RVecs are vector-like types that also offer a number of useful features, for instance vec > 0, when vec is a RVec, returns an RVec with 1 at the positions where the condition is satisfied, and 0 elsewhere. This is useful e.g. to quickly select certain entries of an RVec with vec[vec > 0].
So indeed vec > 0, as a Filter expression, is not convertible to bool.
Hi @eguiraud,
thanks for the answer but I am more confused now then before. The line you sent me produces one TTree which has the filtered vector (e.g. TruthPx), but this is not what I want. I need the whole original TTree to be filtered. As I said, the vectors only contain one element so it essentially should just throw out events with TruthPx[0]<10, for instance, keeping all other branches and thus mimicking the CopyTree method from the original post. I tried this using you original example:
Ah sorry I missed it that your arrays only have one element. Then "TruthPx[0] > 0" is absolutely fine.
doesn’t produce any output file
Ah, that’s my fault, sorry! The Snapshot action is unregistered from the RDataFrame when its return value goes out of scope. I’m cooking up a working example.
EDIT: by the way, a Snapshot action that goes out of scope without being triggered should at least print a warning
Yes, if I use the simulated df it works fine but as soon as I use my own file (see link before) I get these errors. Also, it should be eta[0] in my case, right? (produces same error)