Generic function to plot columns of a TDataFrame after a transformation

Hi,

I have coded a function which allows me to easily plot columns of a TDataFrame by passing as argument the TDataFrame and the names of the columns to plot (if only one column is given it is supposed to be the y values and the abcissa values are generated as an integer sequence).

It reads like this:

using namespace ROOT::Experimental;

void plot_df(TDataFrame& tdf, const std::string& ycol, 
                         const std::string& xcol = "", 
                         const std::string& options = "A*")
{
int argc{1};
char* argv[] = {(char *)""};
TApplication theApp("tapp", &argc, argv);
TCanvas c1("c1");
c1.SetGrid();
TGraph g{};

auto y = *(tdf.Take<double>(ycol));
std::vector<double> x(y.size());
std::iota(x.begin(), x.end(), 0.);
if(xcol != "")
{
x = *(tdf.Take<double>(xcol));
g = TGraph(y.size(), &x[0], &y[0]);
}
else
{
g = TGraph(y.size(), &x[0], &y[0]);
}
g.Draw(options.c_str());
g.SetMarkerStyle(21);
c1.Update();
theApp.Run();
}

I can therefore do this and it works:

int main(int argc, char* argv[])
{
auto fileName = "../data/input_samples.csv";
auto tdf = ROOT::Experimental::TDF::MakeCsvDataFrame(fileName);
plot_df(tdf,"landsat8_b5_20140729","landsat8_b6_20140729") ;
return 0;
}

However, if instead of passing the TDataFrame, I pass the result of a filter, like

plot_df(tdf.Filter("code==211"),"landsat8_b5_20140729","landsat8_b6_20140729") ;

I get this error because of course the type does not fit:

/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx: In function ‘int main(int, char**)’:
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:41:21: error: invalid initialization of non-const reference of type ‘ROOT::Experimental::TDataFrame&’ from an rvalue of type ‘ROOT::Experimental::TDF::TInterface<ROOT::Detail::TDF::TFilterBase>’
   plot_df(tdf.Filter("code==211"),"landsat8_b5_20140729","landsat8_b6_20140729") ;
           ~~~~~~~~~~^~~~~~~~~~~~~
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:9:6: note: in passing argument 1 of ‘void plot_df(ROOT::Experimental::TDataFrame&, const string&, const string&, const string&)’
 void plot_df(TDataFrame& tdf, const std::string& ycol,
      ^~~~~~~

If I change the function signature to this

void plot_df(auto& tdf, const std::string& ycol, 
                   const std::string& xcol = "", 
                   const std::string& options = "A*")

or this

template <typename T>
void plot_df(T& tdf, const std::string& ycol, 
             const std::string& xcol = "", const std::string& options = "A*")

I get this error:

/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx: In function ‘void plot_df(T&, const string&, const string&, const string&)’:
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:20:23: error: expected primary-expression before ‘double’
   auto y = *(tdf.Take<double>(ycol));
                       ^~~~~~
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:20:23: error: expected ‘)’ before ‘double’
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:25:20: error: expected primary-expression before ‘double’
     x = *(tdf.Take<double>(xcol));
                    ^~~~~~
/home/garjola/Dev/RootLearning/sql/scatterplot-dataframe.cxx:25:20: error: expected ‘)’ before ‘double’

So I don’t know how to build a generic function which can be used with the result of a transformation. I was assuming that the “Take” action inside the function would allow to trigger the transformation. Also, I don’t understand why the code works if I copy the body of the function inside main() like this:

int main(int argc, char* argv[])
{
  auto fileName = "../data/input_samples.csv";
  auto tdf = ROOT::Experimental::TDF::MakeCsvDataFrame(fileName);

  auto tdff = tdf.Filter("code==211");
  TApplication theApp("tapp", &argc, argv);
  TCanvas c1("c1");
  c1.SetGrid();
  TGraph g{};

  std::string ycol = "landsat8_b5_20140729";
  std::string xcol = "landsat8_b6_20140729";
  std::string options = "A*";

  auto y = *(tdff.Take<double>(ycol));
  std::vector<double> x(y.size());
  std::iota(x.begin(), x.end(), 0.);
  if(xcol != "")
    {
    x = *(tdff.Take<double>(xcol));
    g = TGraph(y.size(), &x[0], &y[0]);
    }
  else
    {
    g = TGraph(y.size(), &x[0], &y[0]);
    }
  g.Draw(options.c_str());
  g.SetMarkerStyle(21);
  c1.Update();
  theApp.Run();
  return 0;
}

Any hint is welcome.

Thanks.

Garjola

Hi Garjola,
Can you try the template versions but taking the node by value rather than by reference?
I don’t understand the error message in that second case, but you are taking a reference to a temporary (the node returned by Filter).

Cheers,
Enrico

Ok got it, this is one of those cases when switching compilers really helps. clang-4.0 yields the following error:

asd.cpp:16:19: error: use 'template' keyword to treat 'Take' as a dependent template name
   auto x = *(tdf.Take<double>("x"));
                  ^
                  template 
1 error generated.

Minimal example that works correctly:

#include <TApplication.h>
#include <ROOT/TDataFrame.hxx>
#include <TCanvas.h>
#include <TGraph.h>
#include <numeric>
#include <iostream>
using namespace ROOT::Experimental;

template <typename TDFNode>
void plot_df(TDFNode tdf)
{
   TApplication app("tapp", nullptr, nullptr);
   TCanvas c;
   auto y = *(tdf.template Take<double>("y"));
   std::vector<double> x(y.size());
   std::iota(x.begin(), x.end(), 0.);
   TGraph g(x.size(), x.data(), y.data());
   g.Draw();
   c.Update();
   app.Run();
}

int main() {
   TDataFrame d(10);
   auto dd = d.Define("y", "double(tdfentry_)");
   auto ff = dd.Filter("y > 0"); 
   plot_df(ff);
   return 0;
}

As a side note be aware that by writing

auto x = *(tdf.Take<double>("x"));
auto y = *(tdf.Take<double>("y"));

the loop over all entries must be run twice to make the results available when you require them.
Something like this, on the other hand, runs one loop over the data:

auto xproxy = tdf.Take<double>("x");
auto yproxy = tdf.Take<double>("y");
std::vector<double> x = std::move(*xproxy);
std::vector<double> y = std::move(*yproxy);

Cheers,
Enrico

Hi,

Thanks. It works. Now I will have to understand the tdf.template Take thing :hushed:

Is there a reason to prefer passing the TDFNode by value? This also works:

template <typename TDFNode>
void plot_df(TDFNode&& tdf, const std::string& ycol, 
             const std::string& xcol = "", const std::string& options = "A*")

and allows me to pass the filtered data in the function call:

plot_df(tdf.Filter("code==211"),"landsat8_b5_20140729","landsat8_b6_20140729") ;

In the case where I pass it by reference like:

template <typename TDFNode>
void plot_df(TDFNode& tdf, const std::string& ycol, 
             const std::string& xcol = "", const std::string& options = "A*")

I have to create a temporary before the call:

auto tdff = tdf.Filter("code==211"); 
plot_df(tdff,"landsat8_b5_20140729","landsat8_b6_20140729") ;

Thanks again.

Garjola.

Yes, passing by rvalue reference is better than by value in this case (avoids copying some internals) but for this simple case I don’t expect any visible difference.

The tdf.template Take<double> is to let the compiler know that you want to call the template method of the function’s template parameter as opposed to invoke the less-than (<) operator followed by a greater than (>). (see “The template disambiguator for dependent names” at this page) – one of the dark corners of C++ parsing.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.