# The Nth Prime Algorithm

### A prime page by Booker, Carr, et al.

For small primes, namely those less than 30,000,000,000,000, we use the programs and data provided by Andrew Booker (for large primes see below). Here is his description:

In order to find a prime quickly, the nth prime program uses a large stored data table to get close to the right answer first, then finishes with a relatively short computation.  To see how this works, imagine the number line broken into bins, each of size N, i.e. the first is from 0 to N-1, the second from N to 2N-1, etc.  (In my implementation, N equals 19219200.)  In the table we store the number of primes in each bin.  Then, to retrieve the nth prime, the program adds up the numbers until the sum exceeds n.  We then know exactly which bin the nth prime falls into.  The prime can then be found using a sieve algorithm on that bin.

The sieve algorithm works as follows.  First, we compute a set of base primes to be used in sieving.  The base primes consist of all the primes up to the square root of the last number in the bin.  Second, we keep an array (the sieve) which holds a flag byte for each number in the bin.  (Actually, in practice the array is much smaller than the size of a bin, so the bin is first broken into many sub-bins that are sieved one at a time.)  We then "sieve" the bin by crossing out (setting the flag byte for) the multiples of each base prime.  At the end, the numbers that were not crossed out (did not have their flag bytes set) are all the primes in the bin.  These primes are counted until reaching our original number, n.  (For a better description of sieve algorithms, see this entry in Chris Caldwell's Prime Glossary.)

Thus, if the bin size is small enough, the sieving can be done very quickly.  In this way, most of the work in finding a prime was done in computing the data table.  The table, in turn, was computed using a modified sieve algorithm that is well suited to sieving many bins.  The modified algorithm actually sieves many times, once for each residue relatively prime to some number with many small prime factors.  (I used 30030 = 2*3*5*7*11*13.)  So, for example, in the first sieve, all of the numbers have remainder 1 when divided by 30030, so instead of having the flag bytes represent 0,1,2,... as in the standard sieve algorithm, they represent 1,1+30030,1+2*30030,...  This may sound like it creates more work than before, but it turns out to be faster since we only need to consider remainders which are relatively prime to 30030.  In this way, the multiples of 2,3,5,7,11, and 13 are automatically eliminated, making the algorithm roughly 6 times faster.  What's more, the modified algorithm can be easily parallelized, as many computers can each work on separate residues.  The table used by the nth prime program was calculated over 9 hours by a lab of UltraSparcs and SGIs.

For larger primes (and selected smaller values) we use a data file supplied by Andrey V. Kulsha:

Andrey V. Kulsha has a file of the successive rounded values of 1.5*(π(x)-li(x)) for the 100,000,000 multiples of 109 less than 1017 (see the page on his table here). He is still double checking these values, so for now our extension is just experimental.

In 2014, Andrew Carr and I converted these to actually differences (rather than successive differences) so we can find π(x) for multiples of 109 with a single file read and a quick calculation of li(x). For other values we use the nearest multiple of 109 and sieve the region between to find π(x). This is done using primesieve written by Kim Walisch. This can be slow becuase our server is old, 32-bit, and often busy verifying primes for our list of the largest known primes. (Contact Chris Caldwell if you want to donate a new one!)

For pn we start with these terms of the asymptotic expansion for pn [Cipolla1902]

pn = n (log n + log log n - 1 + (log log (n) - 2)/log n - ((log log (n))2 - 6 log log (n) + 11)/(2 log2 n)).

The error, O(n(log log n / log n)3)), still too large, so we apply Newton-Raphson iteration a couple times to find a solution to R(x)=n. (Here R(x) is Riemann's prime counting function.) This will be comfortably within 109 of the correct value for our range. Now we calculate π(x), iterate again to find the other end of a region to seive... Now we sieve as necessary to find the actual value of pn.

We could think of no reason to extend the "random prime" link to beyond the first 1,000,000,000,000 primes.