Reproducibility of results using TRandom in a dataframe

Dear RDataFrame experts,

applying a smearing to a pT resolution I need to access a random value via TRandom3::Uniform(). I’m declaring the TRandom3 random_gen; object a priori to using it in a RDataFrame. My results need to be reproducible which was no problem when setting a fixed seed and running in a classical event loop. Using a RDataFrame I am now running:

random_gen.SetSeed(event);
float random_value = random_gen.Uniform();

for each event while processing it in the RDataFrame, where event is the event number, but I am afraid that this does not guarantee thread safeness as I have only one instance of TRandom3.
I tried (and failed) creating an array of TRandom3 objects and passing them via random_gen[rdfslot_].
Are you having any proposal on how to achieve reproducibility of my results?

Thanks, Johann

ROOT Version: 6.16
Platform: Centos7
Compiler: gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)


Hi @rauser, and welcome to the ROOT forum!

I assume the problem is with running with multi-threading enabled, right? Single-thread applications should do just fine by setting a seed before the event loop starts as usual.

For multi-thread event loops, the situation is complicated: by construction, there is no guarantee about what thread will process what event or in what order. Indeed one expensive way to get reproducibility (at the cost of reducing the quality of the generator, probably) is to have one generator per thread and set the seed to (a deterministic function of) the event number before generating anything. What fails exactly when you try this?

Cheers,
Enrico

Hi Enrico,

thanks for the fast reply! Indeed, it’s a multi-threading only problem.

I think I see the mistake I was taking. When creating a generator for each thread I did set a fixed seed for each of them a priori instead of resetting it for each event at runtime (assuming that events will always be in the same slot). I only tried resettting the seed when having only one generator which will then not be thread-safe. I will test it and come back on it, if I see further problems (which I don’t assume now).

One more question: Why using a deterministic function of the event number instead of the event number as a seed?

Thanks, Johann

The identity function works too! :smile: I don’t know a lot about RNGs, but I suppose that setting the seed to 1 and sampling, then to 2 and sampling, then to 3 and sampling…doesn’t get you the highest quality random sequence (if you care about that).

Cheers,
Enrico

Pinging @moneta in case he has a comment about this.

Hi,

You can seed the generator with the event number. Maybe TRandom3 has not a very good seeding, and I would use in that case of of the MixMax generator which guarantees a complete independent sequence for different seed numbers, and there are 2^64 different seed values available.
For having also a faster seeding time, I would recommend using TRandomMixMax17

Lorenzo

Hi,

thanks a lot! I’ll switch to TRandomMixMax17 then.

Cheers, Johann

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.