Bayesian inference of P(M,A) from measured P(M,X) given non-analytic (numerical) model P(X|A,M)?

Dear RooStats experts,

I have a problem I want to treat using Bayesian methods, and I’m pretty sure that RooStats has all the tools needed to do it - but unfortunately I am getting nowhere with the tutorials, where in most cases I cannot even understand what is the problem that is being “solved”…

Here in a nutshell is the idea: I can measure P(M,X) joint probability distributions, and I have a Monte-Carlo simulation which can calculate P(M,X) for any desired value of the parameter ‘A’, basically by filling a TTree with triplets of (A,M,X) values. What I want to do is, for a given measured P(M,X) distribution, deduce/infer the corresponding P(M,A) distribution using the model results, with all the correct associated errors/uncertainties (also when projecting the P(M,A) distribution to get P(A) for a given sample).

Can anybody give me any help to find what I need to achieve this with RooStats?

Many thanks in advance!

Dear @j.d.frankland,

interesting question! RooStats doesn’t have utilities for this procedure. But the base RooFit package could be useful here.

It depends a bit though on what you want to do mathematically. In particular, how do you intend to build your empirical probability distributions from the simulated (A,M,X) data? In HEP, the most common way to do that is to fill histograms, and use these histograms as pdfs. In RooFit, this can be done with the RooHistPdf. You can then plot it for arbitrary projections, and also do projections on observations in a dataset with the ProjWData command argument, as explained in this tutorial:

The associated uncertainties would be the statistical uncertainties of the bins in the RooHistPdf. Is this what you want to do? I need more info on what you want to do mathematically to be able how to do it with ROOT (if possible).

Dear @jonas

Thanks for your help! I realise I can actually simplify my “problem” further, as I can treat each value of the M parameter (which is in fact an integer taking discrete values) separately. So then I will have a P(X,A) distribution (2-D histogram) generated using my model, my data is a P(X) distribution (1-D histogram), and I want to deduce the P(A) distribution from the data and the model.

Actually after reading your reply I started thinking again about the problem and it might be simpler than I first thought - or I might be making a silly mistake in my reasoning. Let me explain.

The model gives me an empirical P(X,A)=P(X|A)P(A)=P(A|X)P(X) distribution. The model events are generated with a uniform distribution of A, and in fact I can renormalise a posteriori the histogram so that P(A)=1; then my histogram contents are directly P(X|A)=P(A|X)P(X) (or is this where I go wrong?).

Now assuming that my model is ‘correct’, my data histogram P_d(X) should correspond to

P_d(X) = \int P(X|A)P_d(A) dA

and P_d(A) is the distribution I want to deduce. But I can also write (can I?)

P_d(A) = \int P(A|X)P_d(X) dX

and this can be calculated using the model as

P_d(A) = \int \frac{P(X|A)}{P(X)}P_d(X)dX

where P(X|A)/P(X) is just my model histogram further renormalised so that P(X)=1? Then if I can import my renormalised 2D histo as a RooHistPDF and the data histogram as another RooHistPDF, can I use RooFit to define and draw the P_d(A) distribution as the integral of the 2? Strangely, it seems no actual “fitting” is required (which makes me think I must be wrong)…

OK here is as far as I’ve got so far. I think I’m stuck, I don’t know how to get further.

I generate a TH2F with my pseudo-model P(A,X) which I then import as a RooDataHistand then a RooHistPdf(as far as I can tell, impossible to import it directly):

   int bins_obs = 50;
   int bins_par = 50;
   auto h_mod = new TH2F("h_mod","Model",bins_par,0.,1.,bins_obs,-5.,5.);
   auto mod_f2 = new TF2("mod_f2", [](double* x, double*p)
   {
      auto X=x[0];
      auto Y=x[1];
      return TMath::Gaus(Y,8.*X-4,(X+1)*.5);
   }, 0., 1., -5., 5.);
   auto N=500000;
   h_mod->FillRandom("mod_f2",N);
   h_mod->Draw("col");
   new TCanvas;
   // import model histogram
   RooRealVar A("A", "parameter", 0., 1.);
   RooRealVar X("X", "observable", -5., 5.);
   RooDataHist rdh_model("rdh_model","rdh_model", RooArgList(A, X), Import(*h_mod));
   RooHistPdf model("model","2D histo as PDF",RooArgList(A,X),rdh_model);

Then I use a gaussian distribution of the A parameter in order to generate some pseudo-data from the model, by doing \int P(X|A)P(A) dA:

   // generate pseudo-data from gaussian distribution of A
   RooRealVar mean("mean","mean",0.25);
   RooRealVar sigma("sigma","sigma",.05);
   RooGaussian gaussA("gaussA",
                      "Gaussian parameter distribution",
                      A, mean, sigma
                      );
   RooProdPdf data_model("data_model", "Product PDF to make pseudo-data", gaussA, Conditional(model, X));

   // Generate 1000 events in A and X from model
   std::unique_ptr<RooDataSet> dati{data_model.generate({A, X}, 1000)};

   RooPlot *xframe = X.frame();
   dati->plotOn(xframe);
   data_model.plotOn(xframe);
   xframe->Draw();

Finally I try to retrieve the initial parameter distribution by doing the inverse projection using the pseudo-data histogram transformed into a RooHistPdf as for the model:

   RooDataHist dh("dh", "dh", X, *dati);
   RooHistPdf data_pdf("data_pdf","data P(X) as PDF",X,dh);
   RooProdPdf solution("solution", "Invert model using data", data_pdf, Conditional(model, A));

   RooPlot *yframe = A.frame();
   dati->plotOn(yframe);
   data_model.plotOn(yframe);
   solution.plotOn(yframe,LineColor(kRed));
   new TCanvas;
   yframe->Draw();

Amazingly, this sort of works… nearly:


The ‘solution’ has nearly the right mean value, but it’s width is far too large. Can anybody tell me what is wrong with what I’ve done (blindly fumbling around in the dark)?

Cheers

PS. Note that at the very beginning of my code I have to add

   RooAbsPdf::defaultIntegratorConfig()->setEpsRel(1e-2);
   RooAbsPdf::defaultIntegratorConfig()->setEpsAbs(1e-2);

otherwise the 2nd integration never converges.