[TChain] Draw data stream

Hello,

I have a considerable amount of root files containing trees to read (few hundred GB)
I load them using TChain and create an histogram using TChain::Draw(“myvar1 >> myhist”).

To read few files it tooks me 15min to get the final histogram.
Is there a way to not wait the end of TChain::Draw and be able to see the result step by step ?

Cheers,
Marco

Maybe @pcanal or @Axel could give a hint…

I was thinking to loop over TChainElement, and use TTree::Draw on each trees by opening the TFile getting the tree manually, but maybe that’s not the clever way !

Hi Marco!

Open a TBrowser before starting TChain::Draw(). Then start TChain::Draw(). In the Browser, in root/ROOT Memory (“root” is the top-most node in the “directory” panel on the right) you’ll find an object called myhist - that’s your histogram being filled as TChain::Draw() progresses!

Does that work for you?

Cheers, Axel.

@Axel At least for me, the browser completely freezes while running the Draw (trying with only one file, instead of a giant TChain, but it still takes ~30s to do the Draw).

Argh right - we have a dedicated GUI thread only on Windows. So indeed, looping over the files “by hand” sounds like the easiest solution. Else you can use PROOF which sends regular snapshot updates to the session manager GUI. O and it accelerates the processing. But it doesn’t support TTree::Draw() - you’d have to write a TSelector.

Summary: that’s something we will have to address, thanks for pointing that out!

Actually I was thinking to create a “new TThread” and call TBrowser inside.
I tried I didn’t managed, but maybe I did something wrong.
Or is it more complicate than that to have a dedicated thread?

You’d have to move TTree::Draw() into a separate thread, because the TBrowser is a GUI thing, and those cling to the main thread. But yes, that might work…

Sorry for digging up this thread one day before its automatic closure, but I just pushed a feature to master that was basically inspired by this use-case.

In master (and soon in ROOT v6.12) you can now use TDataFrame to loop over your files, fill a histogram and draw an incremental, partial result.

The following snippet is all you need, and runs the event-loop in parallel over all your cores (have not tested this particular source file, but it should give you a good idea):

#include "ROOT/TDataFrame.hxx"
#include "TApplication.h"
#include "TCanvas.h"
using namespace ROOT::Experimental;

int main() {
   TApplication app("app", nullptr, nullptr);
   ROOT::EnableImplicitMT(); // enable multi-threading
   const std::vector<std::string> files = {"file1", "file2", "file3"};
   TDataFrame d("tree", files);
   auto h = d.Histo1D("myvar"); // "book" the filling of a histogram with `myvar`
   // update a canvas with a histogram of `myvar` every 100 entries 
   TCanvas c("c", "myvar");
   h.OnPartialResult(100, [&c](TH1D &h_) { c.cd(); h_.Draw(); c.Update(); });
   h->Draw(); // event-loop is run here
   app.Run(); // let ROOT keep running after the event-loop is over
   return 0;
}

Hope this is useful, I would love to hear your thoughts.
Cheers,
Enrico

2 Likes

Hey, I get some errors due to the way you created the TDataFrame:

$ root_docker test.C 
snapshot: Pulling from rootproject/root-ubuntu16
Digest: sha256:4b9b9f1e8b797e25e818387515d4ee04d415a22ed759762d900ba90ef7b046c5
Status: Image is up to date for rootproject/root-ubuntu16:snapshot
   -----------------------------------------------------------------
  | Welcome to ROOT 6.11/01                     http://root.cern.ch |
  |                                    (c) 1995-2017, The ROOT Team |
  | Built for linuxx8664gcc                                         |
  | From heads/master@v6-09-02-2888-ge50a42c, Oct 03 2017, 19:18:00 |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'      |
   -----------------------------------------------------------------

root [0] 
Processing test.C...
In file included from input_line_9:1:
/cur/test.C:10:15: error: no matching constructor for initialization of 'ROOT::Experimental::TDataFrame'
   TDataFrame d("tree", files);
              ^ ~~~~~~~~~~~~~
/root-build/build/include/ROOT/TDataFrame.hxx:43:4: note: candidate constructor not viable: no known conversion from 'const std::initializer_list<const char *>' to
      'std::string_view' (aka 'std::experimental::__ROOT::basic_string_view<char, std::char_traits<char> >') for 2nd
      argument
   TDataFrame(std::string_view treeName, std::string_view filenameglob, const ColumnNames_t &defaultBranches = {});
   ^
/root-build/build/include/ROOT/TDataFrame.hxx:60:4: note: candidate constructor not viable: no known conversion from 'const std::initializer_list<const char *>' to
      '::TDirectory *' for 2nd argument
   TDataFrame(std::string_view treeName, ::TDirectory *dirPtr, const ColumnNames_t &defaultBranches = {});
   ^
/root-build/build/include/ROOT/TDataFrame.hxx:61:4: note: candidate constructor not viable: no known conversion from 'const char [5]' to 'TTree &' for 1st argument
   TDataFrame(TTree &tree, const ColumnNames_t &defaultBranches = {});
   ^
/root-build/build/include/ROOT/TDataFrame.hxx:63:4: note: candidate constructor not viable: no known conversion from 'const char [5]' to 'std::unique_ptr<TDataSource>' (aka
      'unique_ptr<ROOT::Experimental::TDF::TDataSource>') for 1st argument
   TDataFrame(std::unique_ptr<TDataSource>, const ColumnNames_t &defaultBranches = {});
   ^
/root-build/build/include/ROOT/TDataFrame.hxx:68:14: note: candidate template ignored: disabled by 'enable_if' [with FILENAMESCOLL = std::initializer_list<const char *>]
             TTraits::IsContainer<FILENAMESCOLL>::value && !std::is_same<FILENAMESCOLL, std::string>::value, i...
             ^
/root-build/build/include/ROOT/TDataFrame.hxx:62:4: note: candidate constructor not viable: requires single argument 'numEntries', but 2 arguments were provided
   TDataFrame(ULong64_t numEntries);
   ^
/root-build/build/include/ROOT/TDataFrame.hxx:39:7: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 2 were provided
class TDataFrame : public TDF::TInterface<TDFDetail::TLoopManager> {
      ^
/root-build/build/include/ROOT/TDataFrame.hxx:39:7: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 2 were provided

Yeah that should not happen, I’ll fix it thanks!
You can get around it by declaring “files” as a vector<string> instead of auto.

A fully working example that I did test can be found here: it shows how to register TDataFrame results in a TBrowser and use the browser to check the results while the event-loop is running.

edit: also changed the snippet above to use vector<string>

1 Like

That tutorial script runs fine for me when executing it with root tdf012_InspectAnalysis.C, but it does not quit afterwards (which is fine at first, but there should be a way to close it manually?). Closing the TBrowser or pressing CTRL-C does not do anything.

How about now?
What do you think about the feature though?

Well, now it just quits right away (as expected?). It also does not complain about duplicate App any more.

I really like the feature, even though I currently don’t have a use case for it myself.

One thing that is not so nice, though is that the incremental draw is only from one thread (even though I do understand that that makes it a lot easier to do).

With the previous version I also had to click the canvas afterwards for it to update to the plot from all threads.

Alright alright, you got yourself a mention in the latest commit :smile:
Here is the latest iteration, with these two things also fixed.

It would definitely be possible to keep track of the progress of all threads, but I don’t want to setup anything too complicated in the tutorial: with OnPartialResultSlot one can get the partial results of each worker thread and then it’s just a matter of putting them together in a sensible way.

In my defence the original question referred to a single-thread execution :stuck_out_tongue:

2 Likes

Wohoo, a commit message with my name in it! :smiley:

The newest version seems to work exactly as advertised, and I can even quit ROOT afterwards without killing it :slight_smile:

Now, if only I had an actual use for this ^^

Cheers,
Andreas

1 Like

Hello,
Unfortunately I do not use ROOT6.
But thank you for the update for unix users. I think this will be extremely useful in the near future for me :slight_smile:

On the other hand, I used TThread to process TChain. I opened a TBrowser and I see this in root memory:

h0 is my histogram which is being created. Do you have an idea how I can read it ?
(NB: When it will be ready, I can try to post a use-case for ROOT5 with Thread and mutex how to get this…)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.