# Benford's law

In 1881 the astronomer (and polymath) Simon Newcomb noticed that the front pages of his logarithmic tables frayed faster than the rest of the pages and concluded "the first digit is oftener 1 than any other digit" [Newcomb1881]. He decided that the probability that the first digit is d is log10(1 + 1/d) and gave a similar rule for the second digit.

Fifty-seven years later the General Electric Company physicist Benford popularized this estimate and gave substantial empirical evidence using surface areas of 335 rivers, specific heats and molecular weights of thousands of chemicals and even a list of street addresses [Benford1938]. The over all results fit the rule very well. (In fact too well according to Diaconis and Freedman--they suggest Benford may have improperly massaged his round-off errors to improve the fit [DF1979 p. 363].)

For example, Blair Kelly collects prime factors of Lucas and Fibonacci numbers.  When David Broadhurst scanned about 25 thousand such primes greater than a million (December 2001), he found:

 leading digit d percentage predicted by Benford's law 1 2 3 4 5 6 7 8 9 7869 4456 3107 2342 1977 1665 1360 1255 1107 31.3% 17.7% 12.3% 9.3% 7.8% 6.6% 5.4% 4.9% 4.4% 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

The key to assigning probabilities to sets of numbers is to develop a measure of density (someway of deciding how common events are). Theodore Hill later showed that for continuous densities, that base invariance implied Benford's law distribution [Hill1995]. Roughly speaking, base-invariance means that the shape of the distribution is not too dependent on the choice of base (radix) with which we write the numbers. In the sciences this is viewed as being independent of the choice of units.

A simpler concept is scale-invariance: a probability measure P is scale invariant if for some positive α (not a rational power of the base) the probability P(S) equals P(α S) for "all" sets S. Hill also showed scale-invariance implies base invariance (hence Benford's law). Hill later showed there was a kind of central limit theorem that applied to a wide variety of distributions--that combinations of distributions tend towards the distribution predicted by Benford's law even when the original distributions do not [Hill1996].

Many of Benford's examples were discrete (including such things as street address). Here there is no reason to use a continuous density. The problem is that the natural density usually fails to exist ([Raimi1976] provides a nice review of this and many other related issues). Using other densities (such a logarithmic density or zeta-density) showed that Benford's law applies to the set of integers and the set of prime integers.

Daniel Cohen called a density "super natural" if it is finitely additive, agrees with the natural density on all set of integers where the latter exists, and for sets of integers S assigns the same density to the two sets

S and { n | n/2 or (n-1)/2 is in S }.

Cohen proved that when a supernatural density yields a result for the set of integers, then it must satisfy Benford's law [Cohen1976]. He later proved a version of this result for the set of prime numbers [CK1984].

Among the many uses of Benford's law is examining tax returns looking for fraudulent numbers (because crooks fail to follow Benford's law!) [Nigrini1992, Nigrini1996]  It has also been used to analyze computer storage methods [Knuth97, sect. 4.2].

Related pages (outside of this work)

References:

Benford1938
F. Benford, "The law of anomalous numbers," Proc. Amer. Math. Soc., 78 (1938) 551--572. [See [Newcomb1881]. Data partially questioned in [DF1979, p. 363].]
CK1984
D. Cohen and K. Talbot, "Prime numbers and the first digit phenomenon," J. Number Theory, 18 (December 1984) 261--268.  MR 85j:11014
Cohen1976
D. Cohen, "An explanation of the first digit phenomenon," J. Combin. Theory, Ser. A, 20 (1976) 367--370.  MR 53:10698
DF1979
P. Diaconis and D. Freedman, "On rounding percentages," J. Amer. Stat. Assoc., 74 (1979) 359--364.  MR 81d:62014
Hill1995
T. Hill, "Base-invariance implies Benford's law," Proc. Amer. Math. Soc., 123:3 (March 1995) 887--895. (Annotation available)
Hill1996
T. Hill, "A statistical derivation of the significant-digit law," Statistical Science, 10:4 (1996) 354--363.  MR 98a:60021 (Annotation available)
Knuth97 (sect. 4.2)
D. E. Knuth, Seminumerical algorithms, 3rd edition, The Art of Computer Programming Vol, 2, Addison-Wesley, Reading MA, 1997. [This book is an excellent reference for anyone interested in the basic aspects of programming the algorithms mentioned in these pages.]
Matthews1999
R. Matthews, "The power of one," New Scientist, (1999) 26--30.  10 July. [A simple account of Benford's law.]
Newcomb1881
S. Newcomb, "Note on the frequency of use of the different digits in natural numbers," Amer. J. Math., 4 (1881) 39--40.
Nigrini1992
M. Nigrini, "The detection of income evasion through an analysis of digital distributions," Ph.D. thesis, Dept. of Accounting, Univ. Cincinnati, Cincinnati OH, (1992)
Nigrini1996
M. Nigrini, "A taxpayer compliance application of Benford's law," J. Amer. Taxation Assoc., 18 (1996) 72--91.
Raimi1976
R. A. Raimi, "The first digit problem," Amer. Math. Monthly, 83:7 (1976) 521--538.  MR 53:14593