Hi!
Oh, I didn’t know that setting affected hadd
and not the RDataFrame
, it was one of the recommendations that I had found previously but it hadn’t fixed the issue. At least now I know why (:
Also, I didn’t know I could create an RDF that way, seems definitely easier hehe (:
I’ve been working for a couple of days in this and still haven’t worked out yet a way to reproduce the same issue with the pseudodata for the MWE, although the graphs look pretty much the same. (Please find them attached in this CERNbox folder).
While trying to run my code again, now I get a different crash than originally reported. The stack is QUITE long (+3000 lines), but it reports:
*** Break *** segmentation violation
Fatal in <TBufferMerger>: TBufferMergerFiles must be destroyed before the server
aborting
which I hadn’t seen before.
I’m trying to see if the problem is from any of the gen_*
branches from my NTuple, this is the only thing I can see that is different from what I’m doing in the MWE. Sorry I can’t be of any more help ):
PS: This seems to have solved the issue, I’ll try to check the types of the branches to see what may be causing the eror.
Minimal Working Example
Here is a description of the current state of the MWE.
Pseudodata generation
The following code is used to generate a pseudo-data file:
rdf = ROOT.RDataFrame(int(args.nentries))
ROOT.RDF.Experimental.AddProgressBar(rdf)
for i in range(nvectors):
rdf = rdf.Define(f"vec{i}", f"RandomVec({vector_size})")\
.Define(f"vec{i}_to_mask", f"RandomVec({vector_size})") \
.Define(f"vec{i}_to_select", f"RandomVec({vector_size})")
rdf.Snapshot('DDTree', str(temp_file))
where args.nentries
is the number of entries in the output, nvectors
is the number of vector columns that will be created, and vector_size
is the number of entries per vector.
The macro RandomVec
is defined as follows:
#include <ROOT/RVec.hxx>
#include <TRandom3.h>
#include <numeric>
ROOT::RVecD RandomVec(int length, double min_val = -999.0, double max_val = 999.0) {
static thread_local TRandom3 rng;
ROOT::RVecD vec(length);
for (int i = 0; i < length; i++) {
vec[i] = rng.Uniform(min_val, max_val);
}
return vec;
}
RDataFrame processing
A series of operations, similar in nature to those applied in my original code, are applied to this dataframe:
rdf = ROOT.RDataFrame('DDTree', str(temp_file))
ROOT.RDF.Experimental.AddProgressBar(rdf)
for i in range(nvectors):
rdf = rdf.Define(f"vec{i}_sum", f"ROOT::VecOps::Sum(vec{i})") \
.Define(f"vec{i}_mean", f"ROOT::VecOps::Mean(vec{i})")
for i in range(nvectors):
rdf = rdf.Define(f"sel{i}", f"MaskedVec(vec{i}_to_select, vec{i}_to_mask > 0)") \
.Define(f"sel{i}_sum", f"ROOT::VecOps::Sum(sel{i})") \
.Define(f"sel{i}_max_idx", f"ROOT::VecOps::ArgMax(sel{i})") \
.Define(f"sel{i}_min_idx", f"ROOT::VecOps::ArgMin(sel{i})")
rdf = rdf.Define(f"pair{i}", "return ROOT::RVecD({"+"vec{i}[sel{i}_max_idx], vec{i}[sel{i}_min_idx]".format(i=i)+"})") \
.Define(f"pair{i}_sum", f"ROOT::VecOps::Sum(pair{i})")
rdf = rdf.Filter("(" + " + ".join([f"pair{i}_sum" for i in range(nvectors)]) +") > 0")
ROOT.RDF.SaveGraph(rdf, "./mydot.dot")
snap_config = ROOT.RDF.RSnapshotOptions()
snap_config.fVector2RVec = False # Disable conversion of vectors to RVecs
snap_config.fAutoFlush = 1000
rdf.Snapshot('DDTree', str(file_path), rdf.GetColumnNames(), snap_config)
where MaskedVec
is defined in the following macro:
template <typename T>
ROOT::RVec<T> MaskedVec(const ROOT::RVec<T>& vec, const ROOT::RVec<bool>& mask) {
ROOT::RVec<T> result;
for (size_t i = 0; i < vec.size(); ++i) {
if (mask[i]) {
result.push_back(vec[i]);
}
}
return result;
}
the correspondent #include
statements have been omitted.
This code, however, doesn’t seem to reproduce the crash.
Logs
Some crash logs are available in this CERNbox folder. A list with a short description follows:
Crash running on my MC files with my code @ HTCondor [log]
This is the original crash as reported in the ROOT Forum. It uses the same data as described in the next section.
Crash running on my MC files with my code and 2 threads [log]
This source files is 35GB in disk, and has 409 columns of varying sizes, some being int
, floats
or doubles
or vectors
(or RVec
) of these types. The sample is a post-processing of Drell-Yan to 2 Muons at MiniAOD level.
This crash is not the same as the one reported, but may be related. The original crash happens when running in HTCondor, this was run in a local machine.