Strange behaviour of TTree output

Hi,

I’m observing a strange behaviour of my analysis code that I can’t exactly define, but I’ll try.

My analysis code creates as one of its outputs a TTree that has a few branches. One of the branches is of type std::vector. I’m observing strange things with this branch. Basically, there is some sort of randomness to how the output looks like. In the attached two plots (TSelector_proof1.png, TSelector_proof2.png) I show the contents of this branch in two consecutive run. (I run the same code without any modification on the same input file.) As you can see, they don’t hold the same number of entries.

If I run the same TSelector-based analysis locally, I get another plot. (TSelector_local.png) But even this is not what I would expect. The expected distribution should look like the last included plot. (Reference.png; That one I get using analysis code not based on TSelector.)

But somehow this only affects this one branch. I also have a simple Int_t branch in the tree, and that always looks the same. (With both analysis code, run on either PROOF or locally.) So the TTree merging seems to be working on a basic level.

I’m at a loss. In my “reference analysis” I create the output TTree directly in the output file, while in the TSelector based analysis I create it in memory and only write it to the output file in the last step. This is one part of the code that I’m not too comfortable with, I could’ve easily made some mistake in this part, but I can’t figure out what.

So I just wanted to ask if anything like this was ever seen? Does anyone have an idea what I could be doing wrong?

Cheers,
Attila








Which version of ROOT are you using? Can you provide you ROOT file and your selector file?

Philippe.

Dear Philippe,

I’m using ROOT 5.20. I’ll try to put together the code, but it’s not as simple as providing a single class. I’ll post the sources as soon as I have the time to package them.

Cheers,
Attila

Dear Philippe,

I attached the source code that I used. I should warn you, it’s a pretty large amount of code and it’s not too well documented right now.

Edit: For some reason I can’t bloody attach the source to the posting. You can find it under /afs/cern.ch/user/k/krasznaa/public/sframe.tar.gz.

You should compile it by going into the SFrame directory, sourcing setup.[c]sh and executing make. (You have to set up your environment with one of the ROOT releases first of course.) This will give you 3 libraries and an executable (under SFrame/lib and SFrame/bin respectively). To run the code, go to the SFrame/user/config directory and execute “sframe_main FirstCycle_config.xml”.

Now some description: In this analysis framework the user has to write his/her analysis algorithm by creating a class inheriting from SCycleBase. Such an example “cycle” is the FirstCycle class. The idea is that SCycleBase provides a bunch of convenience functions on top of what you get from a vanilla TSelector. (At the same time it restricts a bit what you can do, but the aim here is to provide the framework for a physics analysis.)

SCycleBase is a pretty complicated class that inherits from TSelector. (There was a separate thread about its inheritance tree…) You can find the re-implemented functions of TSelector in the SCycleBaseExec class. The important functions for the creation of the TTree output of the FirstCycle class are:

[ul]SCycleBaseNTuple::CreateOutputTrees(…)

SCycleBaseNTuple::DeclareVariable(…)

SCycleBaseExec::Process(…)[/ul]

The first one creates the output trees defined in the configuration XML, the second one adds a new branch to one of the output trees (it’s called from FirstCycle::BeginInputData(…)) and the last one fills the output tree(s) after processing each event.

I realise that the code is a bit too complicated to ask for help with it. If you’re willing to try to look at it, you should create the Doxygen documentation for the code. (Using the SFrame/Doxyfile configuration file.) Even though some of the documentation in not up to date in the code, it could still help.

So, this was how to run the code locally. If you want to try to run it on a PROOF cluster, you have to edit the FirstCycle_config.xml configuration file and change the following:

[ul]In the node change RunMode=“LOCAL” to RunMode=“PROOF”.

You’ll also have to change the path names in the nodes to the exact paths to the PAR packages created automatically at compilation. (They should be in your SFrame/lib directory, but the code can’t find them automatically at the moment.)

Lastly, you should change the value of ProofServer to the appropriate server name.[/ul]

Every time you run the application, it will produce two output files. (FirstCycle.Data1.root and FirstCycle.Data2.root) You should only look at the contents of FirstCycle.Data2.root. It will hold two simple histograms, a TTree called FirstCycleTree and a sub-directory with a single TGraph. (You can find how all these are created, in the FirstCycle.cxx source file.) If you look at the FirstCycle::ExecuteEvent(…) function, you’ll see that the contents of the El_p_T branch in the output TTree should be the same as the content of the El_p_T_hist histogram. But they’re not the same. The histogram looks just as it should, but when you plot the contents of the El_p_T branch, you’ll get the result that I posted as TSelector_local.png.

So, if anyone actually tries the code and has problems (which I expect you would, since the code is highly experimental), please contact me for help with it.

Cheers,
Attila

Hi,

A first comment on your code: Book( TH1F( "El_p_T_hist", "Electron p_{T}", 100, 0.0, 150000.0 ) )->Fill( (*m_El_p_T) [i] );is extremely ineffecient (especially since you run than in the inner most loop). This lead to an Histogram being created and deleted for every single call to Fill).

At the very least I recommend that you do the booking outside of the inner most loop and better yet, do it during the initialization.

Alternatively you can use a different interface (for example passing the type and the arguments of the constructor to your function instead of passing a temporary object).

Cheers,
Philippe.

Hi Philippe,

You’re pretty efficient. That’s indeed a very slow part of the code, but I was aware of that. Usually the user is advised to either store the pointer returned by the SCycleBaseHist::Book(…) call, or to use the SCycleBaseHist::Hist(…) function for accessing already created histograms. (The latter function comes in handy when the user doesn’t want to store the pointers to >100 histograms, which can happen in a large analysis. Sometimes I miss how you could reference a histogram with a single number using HBOOK. :slight_smile:)

Thanks a lot for actually looking at the code!

Cheers,
Attila

Hi,

I also noted that you elected to keep the TTree memory resident until the end of the process. This is also ineffecient since you are ‘keeping’ some (or a lot depending of TTree size) of memory to hold the ‘basket’ that are already ready to flush to disk (and that actually you do not use since you do not rewind on the output tree).

Cheers,
Philippe.

Hi,

This problem (randomness in the creation of the output file when the TTree is memory resident and contains a top level stl collection) has been solved in the SVN trunk.

Thanks for reporting this issue.
Philippe.

I’m doing it like this for the moment, because by default you don’t write output objects to a file when running on a PROOF worker node. I received instructions on producing large output files with PROOF a few days ago in a way in which each worker node writes to a temporary file. I guess the problem wouldn’t have showed up in that case. But since the implementation of this latter method is a bit more complicated, it seemed logical to try writing a TTree output with PROOF like this first.

Thanks for looking into this, it’s good that a possible problem has been eliminated.

Cheers,
Attila