One use of statistics is to summarize and portray the characteristics of the contents of a data set or to identify patterns in a data set.
DEFINITIONS OF STATISTICS AND STATISTICIANS
One use of statistics is to summarize and portray
the characteristics of the contents of a data set or to identify patterns in a
data set. This field is known as descriptive statistics or exploratory data
analysis, defined as the branch of statistics that describes the contents of
data or makes a picture based on the data. Sometimes researchers use statistics
to draw conclusions about the world or to test formal hypotheses. The latter
application is known as inferential statistics or confirmatory data analysis.
The field of statistics, which is relatively young,
traces its origins to questions about games of chance. The foundation of
statistics rests on the theory of probability, a subject with origins many
centuries ago in the mathematics of gambling. Motivated by gambling questions,
famous mathematicians such as DeMoivre and Laplace developed probability
theory. Gauss derived least squares estimation (a technique used prominently in
modern regression analysis) as a method to fit the orbits of planets. The field
of statistics was advanced in the late 19th century by the following
developments: (1) Galton’s discovery of regression (a topic we will cover in
Chapter 12); (2) Karl Pearson’s work on parametric fitting of probability
distributions (models for probability distributions that depend on a few
unknown constants that can be estimated from data); and (3) the discovery of
the chisquare approximation (an approximation to the distribution of test
statistics used in contingency tables and goodness of fit problems, to be
covered in Chapter 11). Applications in agriculture, biology, and genetics also
motivated early statistical work.
Subsequently, ideas of statistical inference
evolved in the 20th century, with the important notions being developed from
the 1890s to the 1950s. The leaders in statistics at the beginning of the 20th
century were Karl Pearson, Egon Pearson (Karl Pearson’s son), Harold Cramer,
Ronald Fisher, and Jerzy Neyman. They developed early statistical methodology
and foundational theory. Later applications arose in engineering and the
military (particularly during World War II).
Abraham Wald and his statistical research group at
Columbia University developed sequential analysis (a technique that allows
sampling to stop or continue based on current results) and statistical decision
theory (methods for making decisions in the face of uncertainty based on
optimizing cost or utility functions). Utility functions are functions that
numerically place a value on decisions, so that choices can be compared; the
“best” decision is the one that has the highest or maximum utility.
The University of North Carolina and the University
of California at Berkeley also were major centers for statistics. Harold
Hotelling and Gertrude Cox initiated statistics departments in North Carolina.
Jerzy Neyman came to California and formed a strong statistical research center
at the University of California, Berkeley.
Statistical quality control developed at Bell Labs,
starting with the work of Walter Shewhart. An American statistician, Ed Deming,
took the statistical quality control techniques to Japan along with his
management philosophy; in Japan, he nurtured a high standard of excellence,
which currently is being emulated successfully in the United States.
John Tukey at Princeton University and Bell Labs
developed many important statistical ideas, including:
·
Methods of spectral estimation (a
decomposition of time dependent data in terms of trigonometric functions with
different frequencies) in time series
·
The fast Fourier transform (also
used in the spectral analysis of time series)
·
Robust estimation procedures
(methods of estimation that work well for a variety of probability
distributions)
·
The concept of exploratory data
analysis
·
Many of the tools for exploratory
analysis, including: (a) PRIM9, an early graphical tool for rotating
high-dimensional data on a computer screen. By high-dimensional data we mean
that the number of variables that we are considering is large (even a total of
five to nine variables can be considered large when we are looking for complex
relationships). (b) box-and-whisker and stem-and-leaf plots (to be covered in
Chapter 3).
Given the widespread applications of statistics, it
is not surprising that statisticians can be found at all major universities in
a variety of departments including statistics, biostatistics, mathematics,
public health, management science, economics, and the social sciences. The
federal government employs statisticians at the National Institute of Standards
and Technology, the U.S. Bureau of the Census, the U.S. Department of Energy, the
Bureau of Labor Statistics, the U.S. Food and Drug Administration, and the
National Laboratories, among other agencies. In the private sector,
statisticians are prominent in research groups at AT&T, General Electric,
General Motors, and many Fortune 500 companies, particularly in medical device
and pharmaceutical companies.
Related Topics
TH 2019 - 2025 pharmacy180.com; Developed by Therithal info.