Coloured scatterplot from TDataFrame

Hi,

I would like to generate a couloured scatterplot from 3 columns of a TDataFrame (2 of them are continuous variables that will define the x and y axis and the third column is a categorical variable which I would use to define the color of each point). The result should look something like this.

I have found this example code here that I would like to adapt to my case, but I guess I can’t get a TTree from a TDataFrame?

Or maybe there is a completely different and straightforward way to do that using histograms?

Thanks.

Garjola

Hi garjola,
scatterplots in ROOT are provided by TGraph. TTree::Draw is a handy feature to quickly produce histograms and TGraphs from trees. TDataFrame does not provide a specific action to produce TGraphs, but you can always use Foreach:

TGraph g;
auto fillGraph = [&g](double x, double y) { g.SetPoint(g.GetN(), x, y); };
tdf.Foreach(fillGraph, {"x", "y"});

I hope this helps you doing what you want.

As a side note, in the example code that you linked each TTree::Draw call makes a separate loop over all data – if you fill multiple TGraphs in the same Foreach lambda, TDataFrame loops over data once and fills everything saving you some runtime.

Cheers,
Enrico

Hi,

Thanks. The foreach approach allows me to solve the problem. However, I don’t see how to change the marker color for each point. If I set a different marker color inside the lambda, it does not work.

TGraph g;
auto fillGraph = [&g](double x, double y, Color_t c) { 
         g.SetMarkerColor(c), 
         g.SetPoint(g.GetN(), x, y); };
tdf.Foreach(fillGraph, {"x", "y", "class"});
g.Draw("A*");

I have the feeling that in g.Draw() after the foreach the color used for all markers is the one of the last point added.

So what I do is this: I draw individual markers for each point

  TGraph g;
  //2 invisible points to define the limits of the axis
  g.SetMarkerSize(0.1);
  g.SetPoint(g.GetN(), xmin, ymin);
  g.SetPoint(g.GetN(), xmax, ymax);
  std::vector<TMarker> tmv{};
  auto fillGraph = [&tmv](double x, double y, Color_t c) 
    { 
      TMarker m(x, y, 7);
      m.SetMarkerSize(2);
      m.SetMarkerStyle(20);
      m.SetMarkerColor(c);
      tmv.push_back(m);
    };
  tdfMaxNDVI2cols.Foreach(fillGraph, {"x", "y", "class"});
  g.Draw("A*");
  for(auto&& m : tmv) m.Draw();

I get a result that fits my needs, but I don’t know if creating this dummy graph just to set the axis is a good idea or if there is a simpler way.

Thank you.

Garjola

1 Like

Depending on the dimensionality of your problem, an alternative approach would be to fill N TGraphs, one for each color, and then pass “SAME” to the drawing options to draw them all on the same canvas, superimposed, (selecting a different marker color for each TGraph). This latter approach is analogous to what is done in the example code you linked in your first post.

I do not know if TDF implemented another mechanism but the way to superimpose TGraphs in ROOT is not via the SAME option. By default graphs are superimposed (no option needed) and if you want to start a new plot with a TGraph you need option “A” … so 1st graph with option “A” and the others on top without option "A " . TGraphs can be also grouped in a TMultiGraph in order to draw them in one go and compute automatically the total range containing all of them.

1 Like

…as Olivier said :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.