RDataFrame MT performance running on remote files

eguiraud · November 18, 2020, 9:14am

Hi,

Sounds good.

Great!

No, that should not cause a crash. Do you get a stacktrace? Can you share a minimal self-contained reproducer so that we can debug the crash on our side? Otherwise I suggest you compile the code with debug symbols (-g) and inspect the point of crash with gdb.

Ah I see this is now Crash when creating RDataFrame graph

If that becomes a problem you can use templates to write a generic My2DObservablestruct and two fill aggregate functions, one for the scalar case and one for the RVec case:

template <typename T>
class My2DObservablestruct {
  T values;
  RVec<int> categories;
};

template <typename T>
My2DObservablestruct<T> Build2DObservable(const T& values, const RVec<int> &categories) {
  return My2DObservablestruct<T>{values, categories};
}

// scalar fill
template <typename T>
void fill(TH2D &h, const My2DObservablestruct<T> &c) {
  h.Fill(...);
}

// vector fill
template <typename T>
void fill(TH2D &h, const My2DObservablestruct<RVec<T>> &c) {
  for (...)
    h.Fill(...);
}

// use as:
df.Define("categories_and_values", Build2DObservable<RVec<float>>, {...}).
  .Aggregate(..., fill<RVec<float>>, ...);

Unfortunately not, the type of the column matters because it tells RDF how to read the bytes from disk.

P.S.
better to take RVecs as const& function arguments to avoid extra copies: void fun(const RVec<float> &values) rather than void fun(RVec<float> values).