Definitions of Statistics and Statisticians

| Home | | Advanced Mathematics |

Chapter: Biostatistics for the Health Sciences: What is Statistics? How Is It Applied to the Health Sciences?

One use of statistics is to summarize and portray the characteristics of the contents of a data set or to identify patterns in a data set.

DEFINITIONS OF STATISTICS AND STATISTICIANS

One use of statistics is to summarize and portray the characteristics of the contents of a data set or to identify patterns in a data set. This field is known as descriptive statistics or exploratory data analysis, defined as the branch of statistics that describes the contents of data or makes a picture based on the data. Sometimes researchers use statistics to draw conclusions about the world or to test formal hypotheses. The latter application is known as inferential statistics or confirmatory data analysis.

The field of statistics, which is relatively young, traces its origins to questions about games of chance. The foundation of statistics rests on the theory of probability, a subject with origins many centuries ago in the mathematics of gambling. Motivated by gambling questions, famous mathematicians such as DeMoivre and Laplace developed probability theory. Gauss derived least squares estimation (a technique used prominently in modern regression analysis) as a method to fit the orbits of planets. The field of statistics was advanced in the late 19th century by the following developments: (1) Galton’s discovery of regression (a topic we will cover in Chapter 12); (2) Karl Pearson’s work on parametric fitting of probability distributions (models for probability distributions that depend on a few unknown constants that can be estimated from data); and (3) the discovery of the chisquare approximation (an approximation to the distribution of test statistics used in contingency tables and goodness of fit problems, to be covered in Chapter 11). Applications in agriculture, biology, and genetics also motivated early statistical work.

Subsequently, ideas of statistical inference evolved in the 20th century, with the important notions being developed from the 1890s to the 1950s. The leaders in statistics at the beginning of the 20th century were Karl Pearson, Egon Pearson (Karl Pearson’s son), Harold Cramer, Ronald Fisher, and Jerzy Neyman. They developed early statistical methodology and foundational theory. Later applications arose in engineering and the military (particularly during World War II).

Abraham Wald and his statistical research group at Columbia University developed sequential analysis (a technique that allows sampling to stop or continue based on current results) and statistical decision theory (methods for making decisions in the face of uncertainty based on optimizing cost or utility functions). Utility functions are functions that numerically place a value on decisions, so that choices can be compared; the “best” decision is the one that has the highest or maximum utility.

The University of North Carolina and the University of California at Berkeley also were major centers for statistics. Harold Hotelling and Gertrude Cox initiated statistics departments in North Carolina. Jerzy Neyman came to California and formed a strong statistical research center at the University of California, Berkeley.

Statistical quality control developed at Bell Labs, starting with the work of Walter Shewhart. An American statistician, Ed Deming, took the statistical quality control techniques to Japan along with his management philosophy; in Japan, he nurtured a high standard of excellence, which currently is being emulated successfully in the United States.

John Tukey at Princeton University and Bell Labs developed many important statistical ideas, including:

·        Methods of spectral estimation (a decomposition of time dependent data in terms of trigonometric functions with different frequencies) in time series

·        The fast Fourier transform (also used in the spectral analysis of time series)

·        Robust estimation procedures (methods of estimation that work well for a variety of probability distributions)

·        The concept of exploratory data analysis

·        Many of the tools for exploratory analysis, including: (a) PRIM9, an early graphical tool for rotating high-dimensional data on a computer screen. By high-dimensional data we mean that the number of variables that we are considering is large (even a total of five to nine variables can be considered large when we are looking for complex relationships). (b) box-and-whisker and stem-and-leaf plots (to be covered in Chapter 3).

Given the widespread applications of statistics, it is not surprising that statisticians can be found at all major universities in a variety of departments including statistics, biostatistics, mathematics, public health, management science, economics, and the social sciences. The federal government employs statisticians at the National Institute of Standards and Technology, the U.S. Bureau of the Census, the U.S. Department of Energy, the Bureau of Labor Statistics, the U.S. Food and Drug Administration, and the National Laboratories, among other agencies. In the private sector, statisticians are prominent in research groups at AT&T, General Electric, General Motors, and many Fortune 500 companies, particularly in medical device and pharmaceutical companies.

Related Topics