ROOT Development Release v6.09/02

Danilo · March 8, 2017, 2:53pm

The development release v6.09/02 is out, featuring several new features. Some highlights:

Automatic colouring of plots. How? Check this out here.
The TDataFrame framework landed in ROOT: it is possible to analyse data contained in ROOT trees in a functional manner taking advantage transparently of all cores of your machine.
More building blocks for expressing parallelism: for example, check out our TThreadExecutor class!
More parallelism behind the scenes, for example parallel compression when writing to a TTree. Just let ROOT run your code in parallel: invoke ROOT::EnableImplicitMT().
A faster ROOT: for example lots of symbols are now hidden, TMethodCall is now twice as fast as before.
ClassDefInline(MyClass, 3) adds ClassDef functionality without the need to generate a dictionary, which is especially useful for scripts and other non-framework code.

See https://root.cern.ch/content/release-60902 for sources, binaries and cvmfs installs.

Enjoy!

We’d like to thank everyone contributing to this version, for instance by sending in pull requests, reporting or fixing bugs.

Best regards,
Danilo on behalf of the ROOT team

svml · March 8, 2017, 3:13pm

Wow, all of the new features sound so useful! Thank you so much!!!

svml · April 2, 2017, 12:45am

Could you please let me know if there are plans to expand functionality of TDataFrame to make it similar to Python pandas or to provide a C++ interface to Python pandas as a different class (for example, TPandas)?

A search on Google seems to indicate that there is no good open source C++ library implementing a data frame type (on the level of pandas), which could be used for data manipulation and machine learning. It would be great if ROOT could provide it.

Danilo · April 2, 2017, 9:49am

Hi,

being able to natively read formats different from the ROOT one is part of the development plan of TDataFrame. Pandas is of course at the top of the list also considering that several plugins are already available to read in Pandas files in Excel, csv and other formats.
Let me ask you though about your particular use case: what is the problem you are confronted with?

Cheers,
Danilo

ksmith · April 3, 2017, 8:43pm

Automatic coloring will be a blessing when try to do a quick and dirty analysis. Thanks for this development.

Danilo · April 4, 2017, 6:19am

Hi!
Yes it helps a lot. There is tab completion too which can help you in the exploratory phase as well!

Cheers,
D

couet · April 4, 2017, 6:51am

Automatic coloring will be a blessing when try to do a quick and dirty analysis.

Indeed it can be useful for final plots too. Colors are picked in the current palette and the result often look nicer than what user can produce by picking the standard basic colors.

Thanks for this development.

You’re welcome.

malfonsi79 · April 27, 2017, 9:46am

Dear ROOTers,

I also agree that TDataFrame is very interesting and I look forward to its evolution. I wanted to try it out on my root files and I currently have a problem and a question.

The problem:

Maybe I did something wrong with the ROOT version selection/installation (“git pull” to update my old source dir, “git checkout tags/v6-09-02 -b v6-09-02”, setup with -Dall=ON which includes also root7, no errors/warnings during compilation/installation, root starts smoothly and every other functionality works as expected so far), but the method Define() seems to not exist:

root [4] ROOT::Experimental::TDataFrame a("treeAgilent","/largefilesdisk/ComptonPairProdTelescope/LabMeasurementsData/UntreatedKetekACBias100ohm/Temp200K0/T200K9_26V5_led.root")
(ROOT::Experimental::TDataFrame &) A data frame built on top of the treeAgilent dataset.
root [5] a.Define("stime", [](double t){return t*1e-9;} , "time")
ROOT_prompt_5:1:3: error: no member named 'Define' in 'ROOT::Experimental::TDataFrame'
a.Define("stime", [](double t){return t*1e-9;} , "time")
~ ^
root [6] auto normtime = [](double t){return t*1e-9;}
((lambda) &) @0x7f768efc2050
root [7] a.Define("stime", normtime , "time")
ROOT_prompt_7:1:3: error: no member named 'Define' in 'ROOT::Experimental::TDataFrame'
a.Define("stime", normtime , "time")
~ ^

Indeed when I “tab” to complete the member functions of TDataFrame, the method does not show up (digging a bit through the source code, it should be inherited by a base class).

The question:

When I use a function/functor/lambda that uses a branch with a object, something like e.g.

a.Define("mytmpval", [](Waveform wf){ ...do something... },"wfCh1") //I have a class "Waveform" implemented in my trees

how a (in case large-size) object is passed to the function/functor/lambda? Does it imply an extra copy from the “temporary basket” where the object is loaded from disk? Would be allowed a function/functor/lambda with a reference argument?

Thanks,
Matteo

Danilo · April 27, 2017, 10:21am

Hi Matteo,

I think the issue here is that the method which is called “Define” in the master is called “AddBranch” in the 6.09/02 release.

Cheers,
D

malfonsi79 · April 27, 2017, 10:53am

Do you suggest to rather recompile the master branch?

For some reason the formatting have not separated the paragraph of the question… I write it again:

When I use a function/functor/lambda that uses a branch with a object, something like e.g.

a.Define("mytmpval", [](Waveform wf){ ...do something... },"wfCh1") //I have a class "Waveform" implemented in my trees

how a (in case large-size) object is passed to the function/functor/lambda? Does it imply an extra copy from the “temporary basket” where the object is loaded from disk? Would be allowed a function/functor/lambda with a reference argument?

Danilo · April 27, 2017, 1:19pm

Hi Matteo,

you can move to the master and have the bleeding edge in hands: it’s a choice. Alternatively, you can use AddBranch instead of define.
The reference can be passed without problem: copies need to be avoided! TDataFrame strips out qualifiers using the “decayed” types and not the ones in the signature of the function.
A final note: the list of the branches which need to be used is to be defined as a collection. In your case

a.Define("mytmpval", [](Waveform wf){ ...do something... },{"wfCh1"}) // note the curly braces

In the master we have introduced the functionality which allows to introduce new columns as strings so

a.Define("mytmpval", "wfCh1 + ....");

The performance is the one of compiled code as the string is jitted.

Cheers,
D

malfonsi79 · May 25, 2017, 9:11am

Hi all,

I notice that the nice tutorial on TDataFrame on the doxygen page https://root.cern/doc/master/classROOT_1_1Experimental_1_1TDataFrame.html disappeared.

Maybe you are close to release v6.10 (I noticed new tags on git) and you plan to have this information in some other form, but for the time being can I have the old link to this tutorial?

I have also another question on TDataFrame: how do you “rewind” it for a second loop of calculations? I explain better: I have to run my analysis in two steps, because some algorithms require parameters that are the coming from the results of the first loop. How do I remove already declared actions from the TDataFrame and add a complete new set of actions? Do I need a second TDataFrame? Could the second one this conflict with the first?

Thanks again,
Matteo

Danilo · May 25, 2017, 6:52pm

Hi,

that was a glitch in the documentation generation: it should be back tomorrow.

So, a second TDF will not conflict with the first one: it would just be an additional handle on a certain dataset.
On the other hand, there is no need to “reset” the first TDF. You can re-start from any desired node in your chain of calculation and re-start attaching to it transformations (e.g… filters) or actions (e.g. histograms). After running all the booked actions are cleared.
For example:

ROOT::Experimental::TDataFrame d(treeName, fileName);
auto d_f1 = d.Filter("x > 0");
auto d_f2 = d_f1.Filter("y < 42");
auto c = d_f2.Count();
// Something done with c, actions are cleared
// ...
// Now we can restart fresh with a new chain of actions
auto d_f3 = d.Filter ....

Cheers,
D