Adding weights to multiple backgrounds and significance value

Hi,

I’m trying to setup a training with signal and multiple backgrounds, but I’m really confused with how to set weight the multiple backgrounds so that they are normalised to the same luminosity and how the PrepareTrainingAndTestTree method should be called. There are 4 backgrounds and 1 signal tree. I am scaling them to the same luminosity as follows


	int nSigTrain = sigTrain->GetEntries();
	int nBkg0Train = bkg0Train->GetEntries();
	int nBkg1Train = bkg1Train->GetEntries();
	int nBkg2Train = bkg2Train->GetEntries();
	int nBkg3Train = bkg3Train->GetEntries();


	int nSigTest = sigTest->GetEntries();
	int nBkg0Test = bkg0Test->GetEntries();
	int nBkg1Test = bkg1Test->GetEntries();
	int nBkg2Test = bkg2Test->GetEntries();
	int nBkg3Test = bkg3Test->GetEntries();

  double sigWeight = 0.3*0.3*1.212*3000/(nSigTrain+nSigTest);
  double bkg0Weight = 41.05*1.2*1000*3000/(nBkg0Train+nBkg0Test);
  double bkg1Weight = 108.6*1.46*3000/(nBkg1Train+nBkg1Test);
  double bkg2Weight = 5.34*1.54*3000/(nBkg2Train+nBkg2Test);
  double bkg3Weight = 0.00176*3000/(nBkg3Train+nBkg3Test);


  dataloader->AddSignalTree(sigTrain, sigWeight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bkg0Train, bkg0Weight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bkg1Train, bkg1Weight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bkg2Train, bkg2Weight, TMVA::Types::kTraining);
  dataloader->AddBackgroundTree(bkg3Train, bkg3Weight, TMVA::Types::kTraining);

  dataloader->AddSignalTree(sigTest, sigWeight, TMVA::Types::kTesting);
  dataloader->AddBackgroundTree(bkg0Test, bkg0Weight, TMVA::Types::kTesting);
  dataloader->AddBackgroundTree(bkg1Test, bkg1Weight, TMVA::Types::kTesting);
  dataloader->AddBackgroundTree(bkg2Test, bkg2Weight, TMVA::Types::kTesting);
  dataloader->AddBackgroundTree(bkg3Test, bkg3Weight, TMVA::Types::kTesting); 

and dividing them for training and testing as follows:

   dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
                                      "nTrain_Signal=0:nTrain_Background=0:SplitMode=Random:NormMode=NumEvents:!V" ); 

Could you please tell if it is the correct way?

I see in the BDT cut efficiencies plot, in the lower corner, it is written that there are 1000 signal and 1000 background events. It seems to me that I am not doing the scaling correctly so that the numbers at 3000 fb^{-1} luminosity get reflected here. Please advice on this matter how to proceed?

Thanks and regards,
Antara

tmva_train_sigbkg.cpp (13.7 KB)

I attach my macro here. Any help in this matter will be greatly appreciated.

Best regards,
Antara

Hi,
Looking at the code, what you are doing seems fine. I think in the efficiency plot you see just the actual number of events used and not the sum of their weights and this number it is selected with its corresponding GUI.

Lorenzo

Hi,

Could you please tell how to make the numbers at a specific luminosity visible on the BDT cut efficiencies plot visible? I just want to have a look on the individual signal and background numbers and get an idea to compare with the cut-based analysis. The numbers are also needed for making an idea about the required luminosity for 3 or 5 sigma significance. Here, with BDT analysis, the significance seems to be a bit more than high. So, I am unsure if I am missing something.

Your suggestions to how to get the numbers used will help understand the matter better.

canvas2.pdf (15.8 KB)

Thanks and regards,
Antara

Hi,

This is to ask how do we get the actual numbers of signal and background at a specific luminosity, in the output of BDT. Do we have to modify mvaeffs.C to incorporate the SumOfWeights or is there any other way to see the numbers in GUI?

Your help will be greatly appreciated.

Regards,
Antara

Hi,
I think you should compute the total number of signal events and background events that you expected given a luminosity and then use them in the TMVA GUI to estimate the significance and optimal cut value.

Lorenzo

Hi,

Yes, yes, I have normalised the sample events at a luminosity. It is shown in the my first post in this question and you told the code seems fine. But I am not able to see the numbers on the efficiency plot. Could you tell what code to write for these numbers to be printed on the efficiency plot? I wish to have an idea of the numbers that remain after BDT analysis.

canvas2.pdf (15.8 KB)

It may be a naive query, I am new to TMVA…

Thanks and regards,
Antara

Hi,

I meant that I wanted to see numbers like this on my distribution. Could you tell how to achieve this?

BP1_CSUSY_significance_14tev_new.pdf (389.9 KB)

DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal – training events : 7563
: Signal – testing events : 7508
: Signal – training and testing events: 15071
: Background – training events : 58941
: Background – testing events : 58798
: Background – training and testing events: 117739
:
These are unweighted numbers and the significance from S/Sqrt(S+B) = 29.59 is coming with S = 7508 and B=58941, the unweighted ones. That is why I am confused.

Thanks and regards,
Antara

Hi,
Yes this above are the number of events used. The total number of signal legends you expected will be what you have written before:

nsig = 0.3*0.3*1.212*3000 = 327.24
nbkg = 41.05*1.2*1000*3000 + 108.6*1.46*3000 + 5.34*1.54*3000 + 0.00176*3000 = 148280340

You should then use this number to put in the TMVA GUI when plotting the efficiency.

Lorenzo