
Glossary: Prime Pages: Top 5000: 
In 1881 the astronomer (and polymath) Simon Newcomb noticed that the front pages of his logarithmic tables frayed faster than the rest of the pages and concluded "the first digit is oftener 1 than any other digit" [Newcomb1881]. He decided that the probability that the first digit is d is log_{10}(1 + 1/d) and gave a similar rule for the second digit.
Fiftyseven years later the General Electric Company physicist Benford popularized this estimate and gave substantial empirical evidence using surface areas of 335 rivers, specific heats and molecular weights of thousands of chemicals and even a list of street addresses [Benford1938]. The over all results fit the rule very well. (In fact too well according to Diaconis and Freedmanthey suggest Benford may have improperly massaged his roundoff errors to improve the fit [DF1979, p. 363].) For example, Blair Kelly collects prime factors of Lucas and Fibonacci numbers. When David Broadhurst scanned about 25 thousand such primes greater than a million (December 2001), he found:
The key to assigning probabilities to sets of numbers is to develop a measure of density (someway of deciding how common events are). Theodore Hill later showed that for continuous densities, that base invariance implied Benford's law distribution [Hill1995]. Roughly speaking, baseinvariance means that the shape of the distribution is not too dependent on the choice of base (radix) with which we write the numbers. In the sciences this is viewed as being independent of the choice of units. A simpler concept is scaleinvariance: a probability measure P is scale invariant if for some positive \alpha (not a rational power of the base) the probability P(S) equals P(\alpha S) for "all" sets S. Hill also showed scaleinvariance implies base invariance (hence Benford's law). Hill later showed there was a kind of central limit theorem that applied to a wide variety of distributionsthat combinations of distributions tend towards the distribution predicted by Benford's law even when the original distributions do not [Hill1996]. Many of Benford's examples were discrete (including such things as street address). Here there is no reason to use a continuous density. The problem is that the natural density usually fails to exist ([Raimi1976] provides a nice review of this and many other related issues). Using other densities (such a logarithmic density or zetadensity) showed that Benford's law applies to the set of integers and the set of prime integers. Daniel Cohen called a density "super natural" if it is finitely additive, agrees with the natural density on all set of integers where the latter exists, and for sets of integers S assigns the same density to the two sets S and { n  n/2 or (n1)/2 is in S }.Cohen proved that when a supernatural density yields a result for the set of integers, then it must satisfy Benford's law [Cohen1976]. He later proved a version of this result for the set of prime numbers [CK1984]. Among the many uses of Benford's law is examining tax returns looking for fraudulent numbers (because crooks fail to follow Benford's law!) [Nigrini1992, Nigrini1996] It has also been used to analyze computer storage methods [Knuth97, sect. 4.2].
Related pages (outside of this work)
References:
Chris K. Caldwell © 19992018 (all rights reserved)
