Automating TBranch setting and filling with std::variant

asterenb · November 20, 2021, 11:45pm

Hello experts,

For a long time now I’ve had the desire to make the setup and initialization of my TTree branches more automated. By this I mean something like a struct that can hold the variable connected to the branch address, the name of the branch, and the default value to be cleared per event. For example:

typedef struct BranchInfo {
    float value;
    TString branch_name;
    float default_value;
} BranchInfo;

The idea is to then just initialize a vector of this struct once at the beginning of the code, and then have methods that just loop through all the elements in the vector and automatically make the TBranch and clear the branches per event.

This is okay to do for a single type, of course, but gets trickier if you want to have ints, floats, bools, etc. That’s true even if I already know the types beforehand (so no need for dynamic type inference). I never figured out a way to generalize this construct to multiple types. I thought the new std::variant since c++17 might be able to work for this use case, something like:

typedef struct BranchInfo {
  std::variant<int, float, bool> value;
  TString branch_name;
  std::variant<int, float, bool> default_value;
} BranchInfo;

but I’m not sure and in my tests I wasn’t able to set the branch address to point to a reference of the variant.

Maybe I’m missing something and this is actually possible somehow? Any feedback on this idea would be very welcome!

Thank you,
Andre

Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided

couet · November 22, 2021, 7:34am

I guess @pcanal can help.

pcanal · November 22, 2021, 11:22pm

std::get(std::variant...) returns a references (using std::visit might also work), you can use:

tree->Branch(branchinfo.branch_name, & std::get<float>(branchinfo.value));  // To be tweak for each variant's type

Cheers,
Philippe.

PS. Technically it is plausible that most implementation of would start with the data (rather than the type information) and thus the follow “might” work:

tree->Branch(branchinfo.branch_name, reinterpret_cast<float*>(&(branchinfo.value)));

asterenb · December 6, 2021, 6:41pm

Hello Philippe,

Thank you so much for the help. I was able to get a version of it to work using std::get, which is nice.

If I may ask a follow-up question, I have since been trying to make my “branch holder” object handle all kinds of access to the underlying branch variable. My goal is to make it very easy for others to add a branch to my ntuplizer, such that they only need to specify the branch TString and the default value to be set for every event, with the appropriate literal type (e.g. ul, .f, L, etc). I have something like this:

std::map<TString, BranchInfo> branches({
    {"run", BranchInfo{-1}},
    {"event", BranchInfo{-1}},
    {"ls", BranchInfo{-1}},
    {"pv_x", BranchInfo{-99.}},
    {"pv_y", BranchInfo{-99.}},
    {"pv_z", BranchInfo{-99.}},
    {"nVertices", BranchInfo{0u}
  });

In BranchInfo I then have overloaded the () and = operators to allow access to some branches more directly:

using AllTypes = std::variant<bool, int, float, double, unsigned int, long unsigned int, long long unsigned
int, std::vector<float>, std::vector<int>, std::vector<TString>>;

BranchInfo& operator= (AllTypes val) { value = val; return * this; }
double operator() () { 
      using T = std::decay_t<decltype(value)>;
      if (std::holds_alternative<bool>(value))
        return static_cast<double>(std::get<bool>(value));
      else if (std::holds_alternative<int>(value))
        return static_cast<double>(std::get<int>(value));
      else if (std::holds_alternative<float>(value))
        return static_cast<double>(std::get<float>(value));
      else if (std::holds_alternative<double>(value))
        return static_cast<double>(std::get<double>(value));
      else if (std::holds_alternative<unsigned int>(value))
        return static_cast<double>(std::get<unsigned int>(value));
      else if (std::holds_alternative<long unsigned int>(value))
        return static_cast<double>(std::get<long unsigned int>(value));
      else if (std::holds_alternative<long long unsigned int>(value))
        return static_cast<double>(std::get<long long unsigned int>(value));
      else if (std::holds_alternative<std::vector<int>>(value))
        std::cout << "Error this type is vector int " << std::endl;
        // static_assert(always_false_v<T>, "Don't know how to use vector!");
      else if (std::holds_alternative<std::vector<float>>(value))
        std::cout << "Error this type is vector float " << std::endl;
        // static_assert(always_false_v<T>, "Don't know how to use vector!");
      else if (std::holds_alternative<std::vector<TString>>(value))
        std::cout << "Error this type is vector TString " << std::endl;
        // static_assert(always_false_v<T>, "Don't know how to use vector!");
      else
        std::cout << "Error this type is unknown" << std::endl;
        // static_assert(always_false_v<T>, "Don't know this type!");
      return -9999.;
    }

So I can for example use the assign operator to set a value during the event analyzer:

branches["nVertices"] = 40;

and I can also convert all scalar (non-vector) quantities to double to read them if needed:

std::cout << "Event number: " << branches["event"]() << std::endl;

But what I would really like to do is to be able to access e.g. std::vectors more directly, so for example over the course of the analysis of an event I could just push_back to such a vector, and BranchInfo would automatically understand which variant specialization I’m talking about. However, I haven’t really found a way to do that. I think it should be possible since I am already specifying the type of every branch at compile time, but I can’t for example overload an operator with different return types only. I think I might be able to use some template magic to do that, but I’m not sure how.

Hopefully the question was clear enough, and any pointers will be greatly appreciated.

Cheers.
Andre

pcanal · December 6, 2021, 8:55pm

I am not sure how the vector is related to the TTree nor how it is uses in the analysis. With my little understanding it sounds like “just” using a std::vector<double> would be the most efficient (but there is too many unknown to be sure)

asterenb · December 6, 2021, 9:32pm

Let me try to be more clear a simpler example. If I define two branches in my map:

std::map<TString, BranchInfo> branches({
{"trg_phi", BranchInfo{std::vector<float>{}}},
{"prb_filter", BranchInfo{std::vector<TString>{}}
});

where BranchInfo is the struct as above (just the main definition):

using AllTypes = std::variant<bool, int, float, double, unsigned int, long unsigned int, long long unsigned int, std::vector<float>, std::vector<int>, std::vector<TString>>;
typedef struct BranchInfo {
    AllTypes default_value;
    AllTypes value;
} BranchInfo;

And then I connect the “value” variable of each struct to the branch address:

for (auto & [name, branch] : branches) {
    if (std::holds_alternative<std::vector<float>>(branch.value))
      t1->Branch(name, & std::get<std::vector<float>>(branch.value));
    else if (std::holds_alternative<std::vector<TString>>(branch.value))
     t1->Branch(name, & std::get<std::vector<TString>>(branch.value));
  }

Now, what I would like to do is to fill out, say, the branch “trg_phi” by pushing to the vector. But I don’t want to have to do something like this:

std::get<std::vector<float>>(branches["trg_phi"].value).push_back(0.1);

every time I need to access it, because this sort of eliminates the convenience of having this map of branches in the first place. I would like for my BranchInfo struct to allow me to do something like this instead:

branches["trg_phi"].push_back(0.1);

or

branches["trg_phi"]().push_back(0.1);

I shouldn’t have to know that “trg_phi” is a vector of floats all the time, since I already specified at compile time when initializing the map. I was hoping to have some smart function inside BranchInfo that already knows that this is a vector of floats and automatically returns it when I ask for it. But again, I’m not sure if this is possible without some template voodoo.

Thanks again,
Andre

pcanal · December 6, 2021, 10:32pm

Hi Andre,

The following might be close to what you want:

template <typename Inside>
void BranchInfo::push_back(Inside param)
{
      using value_t = std::decay_t<Inside>;
      if (std::holds_alternative<std::vector<value_t>>(value)) {
          std::get<std::vector<value_t>>(value).push_back(param);
      } else {
          std::cerr << "Error the branch does not contains a vector, it contains a value for the type #" << value.index() << '\n';
     }
}

Cheers,
Philippe.

asterenb · December 8, 2021, 5:11pm

Hi Philippe,

Thanks a lot for the snippet! This is indeed close in the sense that I can repeat it for the different vector functions one might need: push_back(), size(), clear(), etc. It’s not the “general case” but it’s close enough for my needs. I do have one situation that I’m having trouble with: when asking for a given element with at(). In this case I need my function to be able to return a double, or a TString, or another type. But I can’t specify a template like so:

template <typename Inside>
Inside BranchInfo::at(unsigned int & param)
{
      using value_t = std::decay_t<Inside>;
      if (std::holds_alternative<std::vector<value_t>>(value)) {
          return std::get<std::vector<value_t>>(value).at(param);
      } else {
          std::cerr << "Error the branch does not contains a vector, it contains a value for the type #" << value.index() << '\n';
     }
}

I’m not sure how I could work the templates to specialize by return type, I don’t think that’s possible in C++ but I’m not sure. Any ideas for this use case?

Thanks a lot,
Andre

pcanal · December 8, 2021, 8:06pm

That code seems legal. You would then use it with:

branches["trg_phi"].at<std::string>(index);

Cheers,
Philippe.

PS. Alternatively, the at could be a regular function returning a std::any

asterenb · December 10, 2021, 3:08am

Hi Philippe,

You’re right, this templated code works with explicit type inference. What I meant is that I was hoping to enable implicit type deduction so that my users didn’t have to specify the type for each vector they need to access with e.g. .at(). The ideal interface is one where the user only needs to state the type once, and this information is carried through and used whenever a type needs to be deducted. Note that when declaring the map of BranchInfo like so:

std::map<TString, BranchInfo> branches({
{"trg_phi", BranchInfo{std::vector<float>{}}},
{"prb_filter", BranchInfo{std::vector<TString>{}}
});

the type of the variant is already known at compile time. My hope was to somehow store that information and re-use it when needed, such as in the vector at() method or for example when wanting to access the underlying variant object with some getter, e.g.:

typedef struct BranchInfo {
    [...]
    template <typename T>
    T get() { return std::get<T>(value); }
    [...]
    AllTypes value;
} BranchInfo;

I believe that it should be possible to have an automatic template type deduction in the getter since I am giving the type earlier in the code, not at runtime, but I can’t seem to find a way to keep the type and carry it around inside the struct. I saw some piece of code like so:

struct BranchInfo {
  public:
    template<typename T>
    BranchInfo(T &&b)
    : storage{std::forward<T>(b)}
    , getter{ [](AllTypes &storage) -> IBranchInfo& { return std::get<T&>(storage); } }
    {}
    IBranchInfo * operator->() { return &getter(storage); }
    [...]

where you define an “Interface” base class (IBranchInfo) and then each of the variants would be a derived class. Then upon construction of BranchInfo, the return type of the getter lambda method is specified to match one of the derived variant classes. I get the logic but I couldn’t really get this to work properly, and it still seems very convoluted somehow – there’s gotta be a cleaner way out there, or maybe C++ is still not quite in the Python dict() era yet :).

Thanks for your thoughts,
Andre

pcanal · December 10, 2021, 4:15pm

well … the magic in Python dict() happens at run-time :), you can achieve the same in C++ (at a performance cost of course ).

Note that you current scheme has to do the match either via user input or at run-time.

with

std::map<TString, BranchInfo> branches({
{"trg_phi", BranchInfo{std::vector<float>{}}},
{"prb_filter", BranchInfo{std::vector<TString>{}}
});

there might be a way (most likely something other than std::map), to make:

branches["trg_phi"].at<std::string>(index);

work without the std::string and with a compile time match.
But what about:

branches[branchname].at(index);

there is no way for the compiler to know the underlying type since ‘branchname’ will only be know at compile time.

If you want to reduce the duplication of type typing, one solution is to switch from ‘const char*’ to variable name.

BranchInfo<std::vector<float>> trg_phi{"trg_phi"};
BranchInfo<std::vector<TString>> prb_filter{"prb_filter"};

Another other alternative (picked by RDataFrame) is to leverage the interpreter and to generate and compile code at run-time [and/or nesting the declaration and usage essentially in the statement so that the type declaration is propagated]

asterenb · December 10, 2021, 10:41pm

If you want to reduce the duplication of type typing, one solution is to switch from ‘const char*’ to variable name.

But then I lose the benefit of iterating over the elements of the map automatically, no? One of the main goals is to be able to just “define” the branch in one place at the beginning (even if the user has to give the type at compile time) and then just access it whenever via the map key, but also automatically doing things like creating TTree branches or clearing the variable for each event, e.g.:

for (auto & [name, branch] : branches) {
   if (std::holds_alternative<std::vector<TString>>(branch.value))
      t1->Branch(name, & std::get<std::vector<TString>>(branch.value));
    else if (std::holds_alternative<std::vector<float>>(branch.value))
      t1->Branch(name, & std::get<std::vector<float>>(branch.value));
  }

and:

for (auto & [name, branch] : branches) {
    branch.value = branch.default_value;
  }

Things like making a TTree branch automatically work because the function does not need to return anything, so doesn’t need to worry about which variant type is actually active. However, when the function needs to return the object (such as the getter above), that’s when I’m running into trouble. If I implement a templated getter, then I need to explicitly specify the type every time I want to access the variable, even though I have already specified it before at compile time, so the information should be there somehow.

But anyway, sounds like this might be more hassle than I was expecting. Perhaps I should look into the ROOT interpreter idea you mentioned, but I guess it’s surprising that there seems to be no simple way to do this with modern C++. Or just stick with the templated getter and add a little bit of work on the user.

Thanks a lot for your ideas though, this discussion was very helpful!

Cheers,
Andre

system · December 24, 2021, 10:41pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.