TDataframe filling histogram with "hits"

Hello,

First, TDF is awesome! It is a real game changer. Kudos to everyone involved.

My question is this: how can I fill a histogram multiple times per event using a vector of hits?

Perhaps I have overlooked this in the documentation but I would like to do the following:

TDataframe d(t, {"my_hits"});  // where my_hits is a std::vector<Hit>

double x0 = 5.0;
// I want to use this lambda to fill
auto fill_all_hits = [=](const std::vector<Hit>& hits){
  for(const auto& ahit : hits){
    double diff = ahit.x() - x0;
    //Fill histogram here or add to return type which then fills histogram
  }
  }; 
auto h1 = d.Histo1D(TH1D("h0", "dx; ", 20, 0,20),"???");

Ideally there would be an extra argument before the the branch list argument.

I can see maybe two ways of doing this:

  1. Passing the “hist” object to the lambda
auto fill_all_hits = [=](const std::vector<Hit>& hits, auto& hist){
  for(const auto& ahit : hits){
    double diff = ahit.x() - x0;
    hist.Fill({diff});
  }
  }; 
auto h1 = d.Histo1D(TH1D("h0", "dx; ", 20, 0,20),fill_all_hits,{"my_hits"});

I would imagine there are some multithreading issues here.

  1. Return a vector values to fill:
auto fill_all_hits = [=](const std::vector<Hit>& hits){
  std::vector<Hist::CoordArray_t<1>> result;
  for(const auto& ahit : hits){
    double diff = ahit.x() - x0;
    result.push_back({diff});
  }
  return result;
  }; 
auto h1 = d.Histo1D(TH1D("h0", "dx; ", 20, 0,20),fill_all_hits,{"my_hits"});

Problem with both: all filled with default weights but this is already the case. (Maybe return a tuple for non-default weights?)

Cheers,

Hi,
thank you for using TDF, and even more for giving us feedback!
We definitely need users to tell us what common use-cases we missed and we might accommodate more nicely, and this is exactly one such case.

Currently the easiest way to fill a histogram with quantities contained in elements of arrays is by doing a Define of those derived quantities:

// omitting const, &, std:: and other stuff for brevity
df.Define("xs", [](vector<Hit> hits) { return transform(hits.begin(), hits.end(), [x0](Hit h) { return h.x() - x0; })
  .Histo1D("xs");

For each event, Histo1D fills the histogram with all elements of the "xs" vector.
You can also pass a vector of weights, which must have the same size as the vector of values "xs", with obvious results.

In the future I would like to see implemented something very similar to your “solution 1”, which compared to the solution above does not require creating a vector<double> per event, with clear performance benefits. It’s been in the pipeline for some time, we simply had no time to code the feature yet (if you feel brave, PRs are welcome :smiley:).

P.S.
relevant jira issue

Thanks!

So just to be clear: a branch of type std::vector<double> (or int) will be filled for each element automatically.

For future readers of this, the correct usage of std::transform is

#include <algorithm>
#include <iterator>

df.Define("xs", [x0](vector<Hit> hits) {
                std::vector<double> res;
                std::transform(hits.begin(),
                               hits.end(),
                               std::back_inserter(res),
                               [x0](Hit h){ return h.x() - x0; });
                return res;
                }, {"my_hits"})
  .Histo1D("xs");

Note I think the above captures (might) behave differently depending between c++11/c++14

So just to be clear: a branch of type std::vector (or int) will be filled for each element automatically.

Correct!

Thanks for taking the time to spell everything out correctly :smiley: it might be slightly more performant to avoid vector reallocations: you can create res as std::vector<double> res(hits.size()); and change std::back_inserter(res) to res.begin() (only worth to mention it because this operation would be executed million of times in a hot loop).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.