Generate toy data from simulated data using data points

iw273 · April 28, 2020, 4:19pm

Dear Experts,

I was just wondering whether it is possible to use Roofit to generate toy data from simulated data, using the data points of the simulated data themselves to generate the toy data? I know you can use a RooHistPdf to do something roughly similar, but I wonder whether what I am describing is possible or not?

Thanks in advance.

Ifan

StephanH · April 29, 2020, 7:15am

Hi @iw273,

I’m not sure I understand correctly.

When you have a dataset, do you want a toy dataset that’s identical? That’s just copying the dataset.
Do you want to do bootstrapping, i.e. draw single events from the dataset? That doesn’t exist yet, but shouldn’t be too difficult to implement.

iw273 · April 29, 2020, 7:34am

Hi @StephanH, thanks for your reply. Sorry if I wasn’t too clear in my original post. I guess I want to generate ‘new’ toy data points based on the old data points, but these new data points would statistically have the same empirical form as the original data. I think maybe bootstrapping is what I am looking for?

Thanks.

Ifan

StephanH · April 29, 2020, 7:50am

Yes, if you want to “wiggle” the data a bit, but in the end have a statistically compatible distribution, bootstrapping is a good technique.
Now it depends if you have weighted or unweighted data:

Unweighted: Just throw random numbers, and select single events from the dataset. Something like

RooArgSet* variablesInOld = oldData->get(randomEventNumber);
variablesInNewDataset = *variablesInOld;
newData.fill();

Weighted: You need the cumulative weight distribution. Let’s first hope it’s not weighted.

iw273 · April 29, 2020, 8:25am

Hi @StephanH ,

Unfortunately the data is indeed weighted…does this make it a lot harder?

Thanks.

Ifan

StephanH · April 29, 2020, 10:19am

It might actually be easier:
If you are happy with having a weighted dataset, but you just want to wiggle the relative probabilites, just re-throw the weights. For each weight w, generate a w' = random_poisson(w).

Otherwise:

You need to retrieve all weights, and make a cumulative distribution of weights. For weights 3, 2, 1, that would be 3, 5, 6.
Now, throw a random number between 0 and 6, and find the interval this thing falls into. Let’s say you get a 1, that’s index 0, since it’s smaller than 3. A 4 is index 1, since larger than 3 but smaller than 5, a 5.3 is index 2.
Draw the event at this index from the original dataset, and put it in the new dataset.
You will get an (unweighted!) dataset where events pop up as often as their weights in the original dataset dictate.

iw273 · April 29, 2020, 12:04pm

Hi @StephanH,

Great, I will try this out, thanks a lot for your help!

Cheers,

Ifan

system · May 13, 2020, 12:04pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.