TRandom3, problem of repetition?

Hello

Problem: related to the generation of large amount of random numbers using TRandom3.

Example code:
A global TRandom3 object is used for generating random numbers.

This code will generate nEntries “uniformly random” numbers, between 0 and 1, with double precision. They will be stored in a std::map (they will be sorted).

Check of randomness: if the random numbers are uniform, the difference between the following and the previous number should follow an exponential function, with slope the number of elements (for this case).

#include<map>
#include "TRandom3.h"
#include "TH1D.h"

std::map < double , double > aMap;
TRandom3 r;
TH1D* h;

void mapProblem(std::size_t nEntries = 1e8 )
{
  for( std::size_t counter = 0; counter< nEntries; ++counter )
  {
    aMap.emplace( r.Uniform(0,1), counter);
  }
  h= new TH1D("h","",1000,0,10./nEntries);
  double last = 0;
  for( auto mit:aMap )
  {
    if( last )
      h->Fill( mit.first - last );
      
    last = mit.first;
  }
  
  return;
}

Output:
-> For nEntries = 1e5 everything looks ok, as you can see in the attached file mapProblemTH1D_1e5.root mapProblemTH1D_1e5.root (27.3 KB)

-> But for nEntries = 1e8 (it will take few minutes, and several GB of RAM memory), empty bins appear*. as you can see in the attached file mapProblemTH1D_1e8.root. mapProblemTH1D_1e8.root (28.0 KB)

If the histogram is rebined,

h->Rebin(5)

a structure appears superimposed over the exponential. The empty bins (*) are not equally spaced, so they may be the origin of this structure.

Please, correct me if I’m wrong: 1% of the random numbers are not inserted in the std::map because they are repeated (lack of precision?); I don’t know the origin of empty bins.

Question: May TRandom3 random numbers be affected by a kind of correlation between them//repetition effect, or it’s just the randomness check which is failing? (I don’t know any test for random numbers)

Thank you for your time.

Regards,
atd

As an aside, skimming the documentation you find this little tidbit:

relativly fast (slightly slower than TRandom1 and TRandom2 but much faster than TRandom1)

You would see the empty bins also for the smaller number of entries, if you used the same range for the histogram in both cases.

To generate a random number, TRandom3 calculates an unsigned integer in a way that this integer is (pseudo-)random. To convert the integer to a floating-point number between 0 and 1, it is divided by the maximal number that can be represented by an unsigned integer (which is 32 bits, so 2^32) plus 1. You will find that this leads to discrete differences between two random numbers that have to be multiples of about 2.3*10^-10 (= 1 / (2^32+1)). And this is what you see in your histogram. These steps will also appear for all other random-number generators, though you will get smaller steps if you find one that internally uses 64 bit integers.

I think the diehard tests are still considered a valid first test for randomness, but keep in mind that any test can only show that the random numbers are not as random as they should be; there is no way to proof that a set of random numbers is really random.

Hello

I have found the point: it’s a problem of precision because of truncation

#include<map>
#include <cfloat>

#include "TRandom3.h"
#include "TH1D.h"

std::map < long double , long double > aMap;
TRandom3 r;
TH1D* h;

static long double afactor(1e-7);

void mapProblem(std::size_t nEntries = 1e7 )
{
  for( std::size_t counter = 0; counter< nEntries; ++counter )
  {
    //building a random number "long double" 
    long double temp1( r.Uniform(0,1) );
    long double temp2( r.Uniform(0,1) );
    temp1*=afactor;
    temp1+= temp2;
    aMap.emplace( temp1, counter);
  }
  h= new TH1D("h","",1000,0,10./nEntries);
  long double last = 0;
  long double tDiff = 0;
  for( auto mit:aMap )
  {
    if( last )
    {
      tDiff = mit.first - last;
      h->Fill( tDiff );
    }
      
    last = mit.first;
  }
  
  return;
}


Now “long double” is used instead of “double”, because “double” truncates the random numbers*

A double 0.123… will have only 15 significant figures", the rest are not (the number is truncated). You can use “long double” instead, but for building a random number “long double” you will need 2 “double” random number: “temp1” will store the last** significant figures, and “temp2” will store the first ones; and then they are summed.

So, to sum up: I think overloading the functions of TRandom (in order to return also “long double”) could be useful to avoid this kind of problems (and the corresponding shoddy solutions).

Thank you for your time.

*I think it doesn’t matter if your list is small.
**With “static long double afactor” you can control how separated are the random digits.With 1e-14, the effect persist. With 1e-12, you can still see a partial effect. With 1e-8, the effect is negligible.

The mersenne twister is also available in stl as of C++11:
http://www.cplusplus.com/reference/random/

1 Like

Hello

Thank you, it works perfectly!

#include<map>
#include <cfloat>


#include "TH1D.h"

std::map < long double , long double > aMap;

TH1D* h;

void mapProblem(std::size_t nEntries = 1e7 )
{
  unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
  std::default_random_engine generator (seed);

  std::uniform_real_distribution<long double> distribution (0.0,1.0);
  
  for( std::size_t counter = 0; counter< nEntries; ++counter )
  {

    aMap.emplace( distribution(generator), counter);
  }
  h= new TH1D("h","",1000,0,10./nEntries);
  long double last = 0;
  long double tDiff = 0;
  for( auto mit:aMap )
  {
    if( last )
    {
      tDiff = mit.first - last;
      h->Fill( tDiff );
    }
      
    last = mit.first;
  }
  
  return;
}

Do not rely on std::default_random_engine. Is is implementation defined which underlying random bit generator is actually used. Use std::mt19937 instead (or mt19937_64) if you want Mersenne Twister.

For the seed, you can could also use std::random_device (depends on implementation as well) instead of an epoch. In any case, one number as seed is not enough to fill the internal state of MT. You could fill a std::seed_seq if you need more entropy for the initial state (usually you don’t need to care).

2 Likes

Hello

Thank you for the additional info! (I had no idea of this kind of issues, and this info is really valuable!)

Thank you for your time.

Regards!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.