TTreeFormula support only 2 level of variables size collections

Hi all.
This is regarding RDataFrame. However, if another way like TTreeReader is a way to solve, I will appreciate it.

I have following structure in my ROOT file.
Screenshot 2024-08-29 183313

I wanted to draw the ‘energy_’ histogram just simply by double click using mouse. However, I ended up knowing that root throws a following error:

Warning in <TTreeFormula::DefinedVariable>: TTreeFormula support only 2 level of variables size collections.  Assuming '@' notation for the collection singles_.

So, now I want to print (and eventually fill a histogram) the values stored inside ‘energy_’.

I have seen the following topics, but haven’t actually got anything to work for me.
year2012 year2020

I tried to write the following script using RDataFrame (which is not correct).
What are the modifications that are needed?

void beta_rdataframe(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame d(treeName, fileName, {"output_vec_"});

        auto columnNames = d.GetColumnNames();

        std::cout << "Column Names are: " << std::endl;
        for (const auto &col : columnNames){
                std::cout << col << std::endl;
        }

//FOLLOWING LINES OF CODE DOESN'T WORK
        TH1F henergy("energy","energy",3000,0,3000);
        const auto fillenergy = [&henergy](auto output_vec_)
        {
                for (const auto &imp : output_vec_)
                {
                        const auto &gam_vec = imp.input_.output_vec_;
                        for (const auto &gam : gam_vec)
                        {
                                for (const auto &single : gam.singles_)
                                {
                                std::cout << "Energies are: " << std::endl;
                                std::cout << single.energy_ << std::endl;
                                henergy.Fill(single.energy_);
                                }
                        }
                }
        };

        d.Foreach(fillenergy, {"output_vec_"});
        auto c1 = new TCanvas();
        henergy.DrawClone();
}

The ‘partial’ output for the command ‘GetColumnNames’ is the following:

root -l beta_rdataframe.C
Processing beta_rdataframe.C...

Column Names are:
Beta
Beta.input_
Beta.output_vec_
Beta.output_vec_.input_.output_vec_
input_
output_vec_
output_vec_.input_.output_vec_
... (many more)

The root file is large (in tens of GB), so I cannot attach. Sorry for that.
Any help will be appreciated.

ROOT Version: 6.26/08
Platform: Ubuntu 22.04.2 LTS

Thanks.


Hi @Mukul,
thanks for reaching out!
What if you run the following?

treeName = "OutputTree"                                                                                                                                  
df = ROOT.RDataFrame(treeName, fileName)                                                                                                                                      
                                                                                                                                                                              
df.Histo1D("Beta.output_vec_.input_output_vec.singles_.energy_").Draw()
input() 

Thanks for the reply.

Is this a snippet for C language? I can’t run command line ROOT.RDataFrame !
In my previous example I used ROOT::RDataFrame

Thanks.

Yeah, that’s python code, the C++ equivalent is the following

   auto fileName = "beta_3120.root";
   auto treeName = "OutputTree";
   ROOT::RDataFrame df(treeName, fileName);
   auto h = df.Histo1D("Beta.output_vec_.input_output_vec.singles_.energy_");
   h->DrawClone();

So, I did the following:

void beta_rdataframe(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame df(treeName, fileName);

        auto h = df.Histo1D("Beta.output_vec_.input_.output_vec_.singles_.energy_");
        h->DrawClone();
        }

I get the following error:

root -l beta_rdataframe.C

terminate called after throwing an instance of 'std::runtime_error'
  what():  Unknown column: Beta.output_vec_.input_.output_vec_.singles_.energy_

Maybe you could upload a snapshot of a few events, so I can reproduce your error:

 auto d_0_30 = df.Range(30);
 d_0_30.Snapshot("outputTree", "outputFile.root"); 

Thank you for the reply.
I did the following:

void trial_beta_rdataframe(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame df(treeName, fileName);

        auto d_0_3000 = df.Range(3000);
        d_0_3000.Snapshot("OutputTree","beta_test.root");
}

The error in terminal:

terminate called after throwing an instance of 'std::runtime_error'
  what():
An error occurred during just-in-time compilation in RLoopManager::Run. The lines above might indicate the cause of the crash
All RDF objects that have not run their event loop yet should be considered in an invalid state.

Apologies for the inconvenience caused.

Just to give another try,
I also tried following script and ended up with some error.

void beta_rdataframe(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame df(treeName, fileName);

        TH1F henergy("energy","energy",3000,0,3000);
        auto fillenergy = [&henergy](const auto output_vec_)
        {
                const auto &imp_vec = output_vec_;
                for (const auto &imp : imp_vec)
                {
                        const auto &gam_vec = imp.input_.output_vec_;
                        for (const auto &gam : &gam_vec)
                        {
                                for (const auto &single : gam.singles_)
                                {
                                        henergy.Fill(single.energy_);
                                }
                        }
                }
        return;
        };

       df.Foreach(fillenergy, {"output_vec_"});
       auto c1 = new TCanvas();
       henergy.DrawClone();

However, this returned error:

[-bash ~/data/output/ana]$ root -l beta_rdataframe.C
root [0]
Processing beta_rdataframe.C...
In module 'ROOTDataFrame':
/home/opt/root_v6.26.08/include/ROOT/RDF/RInterface.hxx:1390:72: error: no type named 'arg_types_nodecay' in 'ROOT::Detail::CallableTraitsImpl<(lambda at /output/ana/beta_rdataframe.C:32:20), false>'
      using arg_types = typename TTraits::CallableTraits<decltype(f)>::arg_types_nodecay;
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/output/ana/beta_rdataframe.C:72:12: note: in instantiation of function template specialization 'ROOT::RDF::RInterface<ROOT::Detail::RDF::RLoopManager, void>::Foreach<(lambda at /output/ana/beta_rdataframe.C:32:20)>' requested here
        df.Foreach(fillenergy, {"output_vec_"});
           ^
In module 'ROOTDataFrame':
/home/opt/root_v6.26.08/include/ROOT/RDF/RInterface.hxx:1391:71: error: no type named 'ret_type' in 'ROOT::Detail::CallableTraitsImpl<(lambda at /output/ana/beta_rdataframe.C:32:20), false>'
      using ret_type = typename TTraits::CallableTraits<decltype(f)>::ret_type;
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
/home/opt/root_v6.26.08/include/ROOT/RDF/RInterface.hxx:1392:19: error: no matching function for call to 'AddSlotParameter'
      ForeachSlot(RDFInternal::AddSlotParameter<ret_type>(f, arg_types()), columns);
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/opt/root_v6.26.08/include/ROOT/RDF/InterfaceUtils.hxx:628:41: note: candidate template ignored: could not match 'TypeList<type-parameter-0-2...>' against 'int'
std::function<R(unsigned int, Args...)> AddSlotParameter(F &f, TypeList<Args...>)
                                        ^

This looks like a nasty case to deal with, I add our IO expert @pcanal

As a general recommendation, I would suggest to use the latest version ROOT 6.32!

Hi,

As @mdessole proposes, updating to the latest stable is always a good measure to take, especially because 6.26 is a rather old cycle.

In absence of the actual input file, some guesswork needs to be carried out. In your case, I would suggest to specify the type of the parameter in input to the function which is passed to the ForEach action:

auto fillenergy = [&henergy](const TheTypeOfOutputVec output_vec_)

About the Snapshot, I cannot say why the problem is occurring due to lack of context and ability to reproduce.

About this attempt

auto h = df.Histo1D("Beta.output_vec_.input_.output_vec_.singles_.energy_");

I also cannot comment given the lack of context.

Best,
D

Thanks for the reply.

If I do, Print(), then I have following.

root [1] OutputTree->Print()
******************************************************************************
*Tree    :OutputTree: OutputTree                                             *
*Entries :    38186 : Total =     12225561479 bytes  File  Size = 7853177624 *
*        :          : Tree compression factor =   1.56                       *
******************************************************************************
*Branch  :Beta                                                               *
*Entries :    38186 : BranchElement (see below)                              *
*............................................................................*
*Br    0 :input_    :                                                        *
*Entries :    38186 : Total  Size=     117237 bytes  One basket in memory    *
*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br   32 :output_vec_ : Int_t output_vec__                                   *
*Entries :    38186 : Total  Size=     376846 bytes  File Size  =     135915 *
*Baskets :      161 : Basket Size=      32000 bytes  Compression=   2.36     *
*............................................................................*
*Br   38 :output_vec_.input_.output_vec_ : vector<exp:                    *
*         | :ClusterData> output_vec_[output_vec__]                          *
*Entries :    38186 : Total  Size= 1191894790 bytes  File Size  =  673050551 *
*Baskets :      323 : Basket Size=    8095232 bytes  Compression=   1.77     *
*............................................................................*

Do you mean I should mention Int_t or exp::ClusterData ?
I did use both but ended up with nothing.

Also just to mention: What I am dealing is with three levels of std::vectors i.e.,
Beta.output_vec_ ,
input.output_vec_ , and
singles_

Actually, the file isn’t created by me. So, I can’t really make a minimal reproducer.
Apologies again!

Hi,

Could you provide the requested information via these RDF methods ? ROOT: ROOT::RDF::RInterfaceBase Class Reference

Cheers,
D

Thanks for the reply.

So, I did this:

void beta_root_forum(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame d(treeName, fileName);

        auto columnNames = d.GetColumnNames();
        std::cout << "Column Names and their types are: " << std::endl;
        for (const auto &colName : columnNames){
                std::cout << colName << " has type " << d.GetColumnType(colName) << std::endl;
        }
}

and got the following:

[] root -l beta_root_forum.C
Processing beta_root_forum.C...
Column Names and their types are:
**Beta** has type OutputTreeData<OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>,OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData> >
**Beta.input_** has type OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>
**Beta.output_vec_** has type ROOT::VecOps::RVec<OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>>
**Beta.output_vec_.input_.output_vec_** has type ROOT::VecOps::RVec<vector<exp::ClusterData>>
**input_** has type OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>
**output_vec_** has type ROOT::VecOps::RVec<vector<exp::ClusterData>>
**output_vec_.input_.output_vec_** has type ROOT::VecOps::RVec<vector<exp::ClusterData>>

... (many more)

PS: I didn’t understand how to use: GetColumnTypeNamesList() and GetDefinedColumnNames()
I hope you don’t require these?

Hi,

According to the information you reported, you will need to specify the type of the branch Beta, which would be OutputTreeData<OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>,OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData> >

Therefore,

using myType = OutputTreeData<OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>,OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData> >;

auto fillenergy = [&henergy](const myType &output_vec_) {/*the body*/}

In case the type is not available to you, you can resort to ROOT: TFile Class Reference .

Cheers,
D

1 Like

Thanks for the reply.
I will summarize the solution that worked for me.

I did the following:

void beta_root_forum(){

        TFile *file = TFile::Open("beta_3120.root");
        TTree *tree = (TTree*)file->Get("OutputTree");
        file->MakeProject("MyProject","*","new++");
        file->Close();
}

and it generated a directory with many files in it.
I then created a ‘rootlogon.C’ file in which I write a following line to load those generated files:

{
gSystem->Load("MyProject/MyProject");
}

Now, I proceed to write a script to plot the said branch by using RDataFrame method and your suggested line as follows:

void beta_rdataframe(){

        auto fileName = "beta_3120.root";
        auto treeName = "OutputTree";

        ROOT::RDataFrame d(treeName, fileName);

        TH1F henergy("energy","energy",3000,0,3000);
        using myType = OutputTreeData<OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData>,OutputTreeData<OutputTreeData<exp::WData,exp::ClusterData>,exp::BData> >;
        const auto fillenergy = [&henergy](const myType &start)
        {
                const auto &imp_vec = start.output_vec_;
                for (const auto &imp : imp_vec)
                {
                        const auto &gam_vec = imp.input_.output_vec_;
                        for (const auto &gam : gam_vec)
                        {
                                for (const auto &single : gam.singles_)
                                {
                                //std::cout << single.energy_ << std::endl;
                                henergy.Fill(single.energy_);
                                }
                        }
                }
                return;
        };

        d.Foreach(fillenergy, {"Beta"});
        auto c1 = new TCanvas();
        henergy.DrawClone();
}

This indeed returns me with the histogram for ‘energy_’ branch as follows:
Beta.output_vec_.input_.output_vec_.singles_.energy_
.

Thank you so much for your assistance.
I’ve been struggling for days and you have been a savior!

Newbiee!

1 Like

Thanks for your perseverance! I am sure this thread will be useful to others.

Cheers,
Danilo

1 Like