Pseudorandom number generators | Computer Science | Khan Academy

Pseudorandom number generators | Computer Science | Khan Academy

(fun music) (leaves rustling) – [Voiceover] When we
observe the physical world we find random fluctuations everywhere. We can generate truly random numbers by measuring random
fluctuations, known as noise. When we measure this
noise, known as sampling, we can obtain numbers. – [Voiceover] One, two, three, four– – [Voiceover] For example, if we measure the electric current
of TV static over time, we will generate a truly random sequence. We can visualize this random sequence by drawing a path that changes direction according to each number,
known as a random walk. Notice the lack of pattern at all scales. At each point in the sequence the next move is always unpredictable. Random processes are said
to be nondeterministic, since they are impossible
to determine in advance. Machines, on the other
hand, are deterministic. Their operation is
predictable and repeatable. In 1946, John von Neumann was involved in running computations for the military; specifically involved in the
design of the hydrogen bomb. Using a computer called the ENIAC, he planned to repeatedly
calculate approximations of the processes involved
in nuclear fission. However, this required quick access to randomly generated numbers that could be repeated, if needed. However, the ENIAC had very
limited internal memory; storing long random
sequences was not possible. So, Neumann developed an algorithm to mechanically simulate
the scrambling aspect of randomness as follows: First, select a truly random
number, called the “seed”. This number could come from
the measurement of noise, or the current time in milliseconds. Next, this seed is provided as input to a simple calculation. Multiply the seed by itself, and then output the middle of this result. Then you use this output as the next seed, and repeat the process
as many times as needed. This is known as the middle-squares method and is just the first in a long line of pseudorandom number generators. The randomness of the
sequence is dependent on the randomness of the initial seed only. Same seed, same sequence. So, what are the differences between a randomly generated versus pseudorandomly generated sequence? Let’s represent each
sequence as a random walk. They seem similar until
we speed things up. The pseudorandom sequence
must eventually repeat. This occurs when the
algorithm reaches a seed it has previously used,
and the cycle repeats. The length, before a
pseudorandom sequence repeats, is called “the period”. The period is strictly limited by the length of the initial seed. For example, if we use a two-digit seed, then an algorithm can produce, at most, 100 numbers, before reusing a seed and repeating the cycle. A three-digit seed can’t
expand past 1,000 numbers before repeating its cycle, and a four-digit seed can’t expand past 10,000 numbers before repeating. Though if we use a seed large enough, the sequence can expand into trillions and trillions
of digits before repeating. Though the key difference is important. When you generate numbers pseudorandomly, there are many sequences
which cannot occur. For example, if Alice generates
a truly random sequence of 20 shifts, it’s equivalent
to a uniform selection from the stack of all
possible sequences of shifts. This stack contains 26
to the power of 20 pages, which is astronomical in size. If we stood at the bottom
and shined a light upwards, a person at the top
would not see the light for around 200,000,000 years. Compare this to Alice generating a 20 digit pseudorandom sequence, using a four-digit random seed. Now, this is equivalent
to a uniform selection from 10,000 possible initial seeds, meaning she can only generate 10,000 different sequences, which is a vanishingly small fraction of all possible sequences. When we move from random
to pseudorandom shifts, we shrink the key space into a much, much smaller seed-space. So, for a pseudorandom sequence
to be indistinguishable from a randomly generated sequence, it must be impractical for a computer to try all seeds and look for a match. This leads to an important distinction in computer science,
between what is possible, versus what is possible in
a reasonable amount of time. We use the same logic
when we buy a bike lock. We know that anybody can simply try all possible combinations, until they find a match and it opens. But it would take them days to do so. So, for eight hours we
assume it’s practically safe. With pseudorandom generators, the security increases as the
length of the seed increases. If the most powerful computer would take hundreds of years to
run through all seeds, then we safely can assume
it’s practically secure, instead of perfectly secure. As computers get faster the seed size must increase accordingly. Pseudorandomness frees Alice and Bob from having to share their entire random shift sequence in advance. Instead, they share a
relatively short random seed, and expand it into the same random-looking sequence when needed. But what happens if they can never meet to share this random seed?

26 thoughts on “Pseudorandom number generators | Computer Science | Khan Academy

  1. in cryptographically secure pseudo-random number generators such as the /dev/random and /dev/urandom on Unix machines, it constantly takes up new seeds to prevent anyone known the state of the generator at any one time fully. It usually uses stuff like block ciphers and hashes to take essentially random input like keystrokes, and hardware interrupt timing to continually shuffle the output to keep it as secure as possible for generating keys and stuff for encryption such as using a https connection. For CSPRNGs it is required to pass tests for statistical randomness as you wouldn't want to use a encryption key even for a one time pad that was a very simple number that wasn't random at all.

  2. I used 41 and got 4116814624384470560250062538447056025006253844etc

    Although I ended up with a repeating sequence, I did not encounter the seed of 41 repeating itself. Did I do something wrong?

  3. While word size dependence is true for linear congruential generators, there are plenty of random number generators whose period is independent of the word size; as an example, the mersenne generator or R250. What you are saying is accurate but it comes across as if you are saying that LCG is the only algorithm.

  4. For every, I don't know, 10,000 calculations, why don't we start taking MORE than just the middle 3 digits and make THAT be the new seed??? Problem solved, no?

  5. it is on khan acadamey, too:

  6. This video was uncomfortable. But I had to watch as it was recommended by a javascript course I'm taking. Thanks!

  7. I just realized that since this video is in a series, it's probable that Khan Academy defined "shifts" in an earlier video. I remember watching these last year and believe that this may be the case.

Leave a Reply

Your email address will not be published. Required fields are marked *