RooFit, fitting ranges and kernel sensitivity

mattbellis · March 9, 2010, 6:23pm

Hi all (Wouter?),

I’d come across an odd issue in my RooFit scripts which I run using PyRoot. However, upon simplifying them and going back to the C implementation, I find that the behavior is independent. I’ll probably post to the PyRoot forum for this as well, just to see if there’s any insight over there.

I’ve backed up to the rf203_XXX.C example which I’ve now modified and attached here as rf203_ranges_modified_for_multiple_fits.C. My issue is that I am trying to run many toy MC studies by hand (I know there are ways to do this automatically, but I want to have some finer control over the fits). I modified the script to basically generate and run 100 different datasets/fits.

If I do not use a defined fit range, the fits run OK.

If I do use a defined fit range, I get a seg fault after the 85th iterations.

      // Define "signal" range in x as [-3,3]
      x.setRange("signal",-3,3) ;

      // Fit p.d.f only to data in "signal" range
      RooFitResult* r_sig = model.fitTo(*modelData,Save(kTRUE),Range("signal")) ;
      //RooFitResult* r_sig = model.fitTo(*modelData,Save(kTRUE)) ;

Here’s the part which destroyed a nights sleep…this problem only happens if I run on our (SLAC’s) batch farm but not on my laptop.

Both have the latest version of ROOT (I built them by hand).

My laptop is 32 bit and the batch farm is 64. Other things are running fine on the farm.

However, the farm is using a 5.18 kernel and I’m using a 5.26 kernel on my laptop. This seems to be the only discernible difference.

I’ve tried explicitly naming RooFit objects and deleting them, thinking it was some sort of namespace/mangling issue? I know around kernel 5.2X-something there was a change that allowed for far more characters to be passed in on the command line, so I thought it was some other similar memory space issue? I don’t know…this is a bit over my head at this point.

I don’t think I can convince our computer center to build the latest kernel for the whole farm.

Is there any other fix you can suggest to get this swinging? Thanks in advance.

Matt
rf203_ranges_modified_for_multiple_fits.C (1.85 KB)

mattbellis · March 9, 2010, 11:49pm

Typo in the above:

Laptop (where things work): kernel 2.6.32

Batch farm (where things break): kernel 2.6.18.

Oops. I was mentally confusing ROOT releases and kernels. Everything else is correct in the above post.

Matt

mattbellis · March 12, 2010, 10:50pm

So I’ve gathered together more odd information on this problem.

In the earlier examples, I declared the variables and PDF outside the loop. Inside the loop, I generated datasets and fit them over some specified ranges. This setup failed on some machines…worked on others.

For the heck of it, I moved the loop to include everything, including the initial declaration of variables and PDF’s. This seems to work! Even though I would’ve thought this would have the most potential for memory leaks.

I’m attaching two new files, that have the loop in different places. I also tried to strip down these examples to the least amount of text. I added an option to pass in to not use the fitting range, so you can see that this works OK.

rf203_ranges(kTRUE) // uses the fit range...and fails. :(
rf203_ranges(kFALSE) // doesn't use the fit range and works!

To summarize the machine dependent results for rf203_ranges_test.C with a fit range (kTRUE):

root 5.18, kernel 2.4.31, 32-bit: fails after some iterations.

root 5.27, kernel 2.6.18, 64-bit: fails after some iterations

root 5.27, kernel 2.6.32, 32-bit: works like a charm!

I’m soooooooo confused.

I would appreciate it if anyone else could verify this on any other machines. That would help perhaps figuring this out. Thanks in advance to anyone who can offer suggestions…other than upgrading the kernels on the 2.6.18 machines.

Matt
rf203_ranges_test_moved_the_loop.C (1.92 KB)
rf203_ranges_test.C (1.86 KB)

Wouter_Verkerke · March 16, 2010, 8:39pm

Hi,

What you describe sound confusing indeed. I will try out your example code and see what I can find, but it may take a little bit of time. I’ll get back to you in a couple of days.

Wouter

mattbellis · March 16, 2010, 10:47pm

Hey Wouter,

Sounds good. Here’s a bit more info for you.

One of the warnings I was getting was from RooAbsOptTestStatistic and told me

WARNING: Must clone input data when a range specification is given, ignoring request to use original input dataset

I found this code in roofitcore/src/RooAbsOptTestStatistic.cxx

  // Copy data and strip entries lost by adjusted fit range, _dataClone ranges will be copied from realDepSet ranges
  if (rangeName && strlen(rangeName)) {
    if (!cloneInputData) {
      coutW(InputArguments) << "RooAbsOptTestStatistic::ctor(" << GetName()
                << ") WARNING: Must clone input data when a range specification is given, ignoring request to use original input dataset" << endl ;
    }
    _dataClone = ((RooAbsData&)indata).reduce(RooFit::SelectVars(*realDepSet),RooFit::CutRange(rangeName)) ;
    _ownData = kTRUE ;
  } else {
    if (cloneInputData) {
      _dataClone = (RooAbsData*) indata.Clone() ;
      //reduce(RooFit::SelectVars(*indata.get())) ; //  ((RooAbsData&)data).reduce(RooFit::SelectVars(*realDepSet)) ;
      _ownData = kTRUE ;
    } else {
      _dataClone = &indata ;
      _ownData = kFALSE ;
    }
  }

When my code passes in the Range option, the underlying RooFit code actually makes reduced datasets of the original dataset, and uses that in the fit.

So I tried doing this by hand in my original script. I now make a reduced dataset (or multiple datasets for multiple ranges) and pass that in (or the appended reduced datasets). That works! I can loop over this to my hearts content! Woo-hoo!

This works for both .C and .py scripts.

However, in my travels through this mystery I came across an older version of the above source code where the following was not in there.

_ownData = kTRUE ;

I don’t know that this is the issue, and to be honest, I was unable to check that version of ROOT/RooFit with these test scripts so I can’t be sure. I merely pass it on as another clue.

I don’t know if _ownData is the key. I’m not even sure if “own” is a possesive (this is this object’s “own” data) or if it is a verb (this object “own’s” the data) or if it is “own”'ed by something else. Usually when ROOT breaks on me it’s because I’m unaware that ROOT is taking care of some sort of ownership behind the scenes. Probably this means I should read the manual more deeply?

To paraphrase Suzanne Vega in “The Queen and the Soldier”; ROOT, your ways are very strange…at least to the non-expert like myself.

Sigh…I’m rambling now. I appreciate any clues you can shed on this, Wouter, I can’t thank you enough for the development and maintenance of RooFit.

Matt

Wouter_Verkerke · March 17, 2010, 8:47pm

Hi Matt,

Thanks. That extra bit of info you is in fact very useful. I’ve recently been debugging this code (RooAbsOptTestStatistic ctor) and already found and fixed at least one problem related to _ownData. I hope to be able to commit this package of fixes next week. At that point I should retry your macro
to see if I need dig deeper.

Wouter

mattbellis · March 17, 2010, 8:49pm

Coolio. Lemme’ know if/when you want me to svn update and give it another shot.

Matt