Branch renaming failure (?)

Hello,

I’m trying to make a script that renames a bunch of branches in a tree. Most of the branches are std::vector<some_type>, but one is just an unsigned int (“RunNumber” in the below). This code works for all the vector-type branches:

// Open file and rename some branches
TFile * output_file = new TFile(output_filename.c_str(),"UPDATE");
TTree *tree = (TTree*)output_file->Get(tree_name.c_str());
std::vector<std::string> final_names = {"main_track_nominal_nothadron","main_track_nominal_notmuon","stable_track_nominal_nothadron","main_track_validation_nothadron","main_track_validation_notmuon","stable_track_validation_nothadron"}; //,"RunNumber"};
for (auto out_name : final_names) {
  std::string in_name = out_name + "_new";
  auto branch = tree->GetBranch(in_name.c_str());
  branch->SetName(out_name.c_str());
  branch->SetTitle(out_name.c_str());
}
tree->Write();

But if I include RunNumber in the list, the final tree doesn’t seem to have the branch named correctly (there’s no title):

*............................................................................*
*Br   18 :RunNumber : UInt_t                                                 *
*Entries :   632573 : Total  Size=    2532527 bytes  File Size  =      25207 *
*Baskets :       18 : Basket Size=     339968 bytes  Compression= 100.44     *
*............................................................................*

I excluded RunNumber and instead added this code, which includes the type in SetTitle:

auto branch = tree->GetBranch("RunNumber_new");
branch->SetName("RunNumber");
branch->SetTitle("RunNumber/i");

In both cases, running my RDataFrame-based code on the output root file crashes with the error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Unknown column: RunNumber

I can open the root files interactively and look at the contents of RunNumber: they can be displayed in a TTree and are correctly filled. Any idea what I’m doing wrong that my DataFrame can’t interpret the branch?

Cheers,
Kate


Please read tips for efficient and successful posting and posting code

ROOT Version: Not Provided
Platform: Not Provided
Compiler: Not Provided


Hi,
the error says that RDataFrame doesn’t see a column called RunNumber. Shouldn’t the column be called RunNumber_new though?

In any case, you can check which columns RDF sees with dataframe.GetColumnNames() and dataframe.GetColumnType("RunNumber_new"). If the column is in the file but RDF does not see it, please share a problematic ROOT file with us so we can debug what’s going on.

Cheers,
Enrico

Hi Enrico,

No, I’m trying to rename the branch from “RunNumber_new” to “RunNumber” - long story short, I just need this file to have the same branch names as those in some other files so that my DataFrame can run on all of them.

I tried GetColumnNames() but it seems that when I’ve just created the frame from an existing tree, that gives me a totally empty list - even on files where everything is read in correctly and the code runs. So I assumed that it only works for columns if I’ve defined them myself, rather than picked them up from the tree. Are you sure I should expect it to work?

Sadly I can’t share the problematic file because it’s experiment internal data. Was just hoping someone could spot the problem in my branch naming method.

Cheers,
Kate

Hi,
GetColumnNames should definitely return the full list of available columns.

Sadly I can’t share the problematic file

Ok, then maybe you can modify the short program below to reproduce the incorrect behavior? I don’t see anything obviously wrong in the code you posted above – but there could be something weird going on with branch names, titles, the way the file is written, etc, so some way to reproduce the issue would be a great help.

Cheers,
Enrico

#include <TFile.h>
#include <TTree.h>
#include <ROOT/RDataFrame.hxx>
#include <string>
#include <iostream>

void MakeTree(const std::string &treename, const std::string &fname)
{
   TFile f(fname.c_str(), "recreate");
   TTree t(treename.c_str(), treename.c_str());
   unsigned int uint = 42u;
   t.Branch("RunNumber", &uint);
   t.Fill();
   t.Write();
   f.Close();
}

int main()
{
   const auto fname = "f.root";
   const auto treename = "t";

   MakeTree(treename, fname);

   TFile f(fname);
   TTree *t = nullptr;
   f.GetObject(treename, t);
   ROOT::RDataFrame df(*t);
   for (const auto &c : df.GetColumnNames())
      std::cout << c << std::endl;
   std::cout << *df.Mean<unsigned int>("RunNumber") << std::endl;

   // also works:
   // ROOT::RDataFrame df2(treename, fname);
   // for (const auto &c : df2.GetColumnNames())
   //    std::cout << c << std::endl;
   // std::cout << *df2.Mean<unsigned int>("RunNumber") << std::endl;

   return 0;
}

Hi Enrico,

Aha, this was very useful! I was implementing GetColumnNames wrong and that’s why I was getting an empty list back. Thanks to your code, I can see that after trying to rename my branch like so:

auto branch = tree->GetBranch("RunNumber_new");
branch->SetName("RunNumber");
tree->Write();

when I read and print the column list I see this:

[... branches ...]
RunNumber.RunNumber_new
[...more branches...]

So that’s definitely informative! But now I don’t know why my renaming produced this weird result. Any ideas?

Thanks again!!
Kate

Time to bring in @pcanal or @Axel :smiley:

Ooooh I’m not sure we can do that. The branches’ payload was already written with a different name… @pcanal what’s your verdict? Do we have a work-around?

The challenge is that the branches are held in a THashList where the hash is based on the name. When you call SetName the hash recorded is the list is not update and thus you can no longer find the branch when doing a name based search (i.e. what RDataFrame has to do).
So in your case, it seems that doing:

for (auto out_name : final_names) {
  std::string in_name = out_name + "_new";
  auto branch = tree->GetBranch(in_name.c_str());
  tree->GetListOfBranches()->Remove(branch);
  branch->SetName(out_name.c_str());
  branch->SetTitle(out_name.c_str());
  tree->GetListOfBranches()->Add(branch);
}

would help (but it would work only in simple case and (as coded above) is likely to result in the branch being recorded in the list in a different order (not necessarily a problem but look at tree->Print() before and after).

Hi @pcanal,

Thanks for the helpful code!
It’s not quite working yet, as once I’ve written the output file, I get a segfault when I try to access the tree. But this does in principle run, at least :slight_smile:

Cheers,
Kate

Hi everyone,

Thanks again for all the assistance! Problem “solved” using some pretty dubious decision making on my part, but I’ll post here for anyone else with similar issues. I gave up on renaming branch issues altogether. Instead I used another RDataFrame:

auto RenameColumns(RNode df, const std::vector<std::string> newnames, unsigned int i=0) {

  if (i == newnames.size())
    return df;

  if (newnames.at(i).empty())
    return df;

  std::string newname = newnames.at(i);
  std::string oldname = newname + "_new";

  return RenameColumns(df.Define(newname,oldname),newnames,i+1);

}


    // Make a new RDataFrame passing this back through
    RDataFrame second_frame(tree_name, output_filename);
    std::vector<std::string> to_rename = {"main_track_nominal_nothadron","main_track_nominal_notmuon","stable_track_nominal_nothadron","main_track_validation_nothadron","main_track_validation_notmuon","stable_track_validation_nothadron","track_isMuonSignal","track_qoverp","track_corrected_pixeldEdx","track_nUsedHitsdEdx","track_nIBLOverflowsdEdx","RunNumber"};
    auto out_tree = RenameColumns(second_frame,to_rename);

    // And now snapshot that as a replacement file
    out_tree.Snapshot(tree_name,output_filename,to_rename);

No shame, no regrets!

Cheers,
Kate

1 Like

Hi Kate,
yep, that works :wink:

If it turns out to be too slow for your purposes, you might want to try to use Alias instead of Define in the recursive call at the end of RenameColumns: Define copies the old column in the new columns, alias makes no copy and just creates a new name for the old thing – everything else should work as is.

Cheers,
Enrico

Thanks - Alias definitely works better (also prevents columns ending up as ROOT::Detail::VecOps::RAdoptAllocator)!

Cheers,
Kate

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.