Date: Thu, 3 Nov 94 04:07:26 JST From: Stephen Turnbull To: ptc AT ironwood DOT cray DOT com Cc: DJGPP AT Sun DOT SOE DOT Clarkson DOT edu, (DJGPP, mailing, list) Subject: Re:Is it random number sequence? I *hope* that you have a reliable source that you can cite to me and flame me right back into the ground, because if my understanding is correct, *you are seriously confused*. (This of course is no biggie, I was quite confused myself, as we'll see.) My source is my 90% recollection of Donald Knuth's discussion in Seminumerical Algorithms. An oldie but still goodie. The one sensible thing you wrote is "the numbers you get depend on how good the random number generator is." But that's the point of the thread: the DJGPP implementation apparently fails one reasonable test very badly. My point in my post was that this is a common way for generators to fail, and that according to Knuth taking higher order bits is a good idea. This may have been falsified by more recent research; a quick scan of some of the standard references (ie, I grepped for "rand" in my DJGPP archives---we've been through this before, as usual) didn't show it though. You say it's not a good idea to play with bits. To get a narrow range pseudo-random sequence with a long period, there is no alternative. (You could do something that is truly random, as far as we know, such as use the decays in a Geiger counter. That's not good; we need a pseudo-random sequence for many applications.) For example, you suggest printf(" %d", (int)( 8 * (double) rand() / RAND_MAX)); Now, all three factors are converted to double, which has 55 bits of precision on both Intel and Motorola machines, dunno Sparcs, as I recall. Since 8*RANDMAX has at most 19 bits, the multiplication is effectively integer arithmentic. I believe that all of these machines are radix 2 floating point; that means with RAND_MAX = 65536, the division is a 16 place right shift, except for the rounding. Ie, in the case of DJGPP, except for a possible rounding issue, your algorithm can be accomplished as (dropping the printf()) ((rand() << 3) & (7 << lg RAND_MAX)) >> lg RAND_MAX or somewhat more efficiently (rand() & (7 << (lg RAND_MAX - 3))) >> (lg RAND_MAX - 3) or really efficiently rand() >> (lg RAND_MAX - 3) where 'lg' is the logarithm base 2. (So this formula doesn't quite work for the Sun.) Ie, what you're really doing is taking the top three bits of rand(). I suggest that you stipulate (at least for the list, I'll argue it in private if you want) that the difference between this and the case where RAND_MAX = 32767 is too small to worry about for our purposes (remember that the bad behavior was 0 1 2 3 0 1 2 3 ad infinitum, apparently), and the rounding issue can be handled by an addition (I'm pretty sure). I do *not* mean that the low order bits will be similarly cyclical; in fact I would guess that the problem with the DJGPP generator is exactly that use of a nice power of two as RAND_MAX. Generators where RAND_MAX is odd should have better behavior in the low order bits. I do mean that taking the high order bits is effectively the same as dividing by a high power of two - 1 for long periods of the sequence. What I had in mind but didn't think about long enough to get right was a generalization of the middle formula (rand() & (7 << k)) >> k where 0 <= k <= lg RAND_MAX - 3. k = 0 is likely to be bad, but large k will be acceptable for most generators actually in use. k = RAND_MAX - 3 is probably best, as rand is (presumably, I haven't looked at the library source) actually the bottom 16 bits of a sequence of 32 bit numbers, probably generated by a linear congruential generator. Thus, your algorithm actually amounts to taking the middle three bits from a pseudo-random sequence of 32-bit numbers, more or less what I suggested. Note that your method is computationally very expensive, as it involves an int to double cast and a double to int cast for every draw, as well as two floating point operations. --Steve