Negative Event weights and mean-values in histograms

Hello.

I have a question concerning the calculation of mean values in histograms when using negative event weights.

When I fill a TH1D using weights that can be negative (e.g. in MC@NLO data),
I find that the mean value of the histogram is calculated strangely.

I have identified the corresponding line in the source code of TH1D, which is copied below and marked at the respective lines with comments:

Int_t myTH1D::Fill(Double_t x, Double_t w)
{

   ...

   Double_t z= (w > 0 ? w : -w);
      // why is the weight here always put to be positive? Shouldn't it simply be z=w here?
   fTsumw   += z;
   fTsumw2  += z*z;
     // same point as above, should'nt it be something like
     // if(w<0.)
     // {
     //  fTsumw2+= (-1.)*(z*z);
     // }
     //  else
     // {
     //  fTsumw2+= z*z;
     // };
   fTsumwx  += z*x;
   fTsumwx2 += z*x*x;
   return bin;
};

the same point has been postet already here:
root.cern.ch/phpBB2/viewtopic.ph … ght=weight
where you also find a running example which I attached here also.

The problem becomes obvious when I try to scale the histogram by
histo->Scale(scalefactor);

The mean of the histogram changes here after scaling when using negative event weights, which should not be.

Maybe I didn’t understand quite well what negative weights really mean, but
according to the MCatNLO documentation, they should be treated as usual weights.

Best Regards,
Joerg Walbersloh
EventWeights.C (397 Bytes)

I think you have touched a delicate point.
Negative weights are not really defined in a statistical sense and they are ambiguous. What is the mean of having for example two events in a bin with a weight of + 1 and then one event with a weight -1 ?
Is it equivalent to fill only one time with a weight of 1 or the error is the sum of the weight square, sqrt(3) ?
Also, calculating a weighted mean using negative weights does not make any sense at all.

Do you know what do they mean in the MC@NLO with the negative weights ?

Best Regards

Lorenzo

Hi Joerg,

a solution might be to create pairs of histograms: one for events with positive weights and a second one for those with negative weights (where you fill the events with |weight|). Then you do your histogram manipulations separately, and only combine them at the end. This should give you well-defined behavior for all intermediate operations. I have no idea whether this helps you in any way for the weighted mean, though - probably not. But maybe you can delay the combination of the pos and neg samples here, too. At least scaling them will be well defined.

Cheers, Axel.

Thanks everybody for the quick replies.

I also hav ebeen asking myself about the meaning of negative weights in statistics in general.

However, let me give you some quotes from papers that tell users what to do with MC@NLO weights:

This is a talk by Stefano (author of MC@NLO) where he explaines the usage of negative weights:
wlap.org/file-archive//atlas … ixione.pdf

The MC@NLO manual states that …

Find this manual here:
hep.phy.cam.ac.uk/theory/web … _man33.pdf

Another nice overview can be found here:
hep.ucl.ac.uk/~dwaters/cdf/mcatnlo.html

The more theoretical explanation of these negative weights are described in this paper:
iop.org/EJ/abstract/1126-6708/2002/06/029

So my conclusion to monetas question is: use the +/-1 weights as you would use normal weights, i.e. Fill the histo by:

histo->Fill( value, weight );

Again, this leads to my initial post.

However, Axels idea seems quite nice, I’ll try this and post my experience here.

Back again.

Unfortunately Axels idea didn’t work out.

I have instead created myself a solution by writing a class ‘myTH1’ which inherits from the original TH1. I have modified this new class to my purposes; I could identify all problems in the original source code that were related to the negative weights and redefined these code pieces to work with ±1 weights now.

I can use my own histogram via LoadClass now, but of course, this is only a temporary solution (and not the best, since I do not know what will happen if I use other methods with my modified statistics).
At least, I can do what I need right now.

Nevertheless, the main point on understanding the correct statistical meaning of negative weights still remains an open issue and shoul be discussed further.

Best,
Joerg