1950-1974: Re-birth of data visualization
Still under the influence of the formal and numerical zeitgeist from the mid-1930s on, data visualization began to rise from dormancy in the mid 1960s, spurred largely by three significant developments:
- In the USA, John W. Tukey, in a landmark paper, "The Future of Data Analysis" (Tukey:1962), issued a call for the recognition of data analysis as a legitimate branch of statistics distinct from mathematical statistics; shortly, he began the invention of a wide variety of new, simple, and effective graphic displays, under the rubric of "Exploratory Data Analysis" (EDA). Tukey's stature as a statistician and the scope of his informal, robust, and graphical approach to data analysis were as influential as his graphical innovations. Although not published until 1977, chapters from Tukey's EDA book (Tukey:1977) were widely circulated as they began to appear in 1970-1972, and began to make graphical data analysis both interesting and respectable again.
- In France, Jacques Bertin published the monumental Semiologie Graphique (Bertin:1967). To some, this appeared to do for graphics what Mendeleev had done for the organization of the chemical elements, that is, to organize the visual and perceptual elements of graphics according to the features and relations in data.
- But the skills of hand-drawn maps and graphics had withered during the dormant "modern dark ages" of graphics (though every figure in Tukey's EDA (Tukey:1977) was, by intention, hand-drawn). Computer processing of data had begun, and offered the possibility to construct old and new graphic forms by computer programs. True high-resolution graphics were developed, but would take a while to enter common use.
By the end of this period significant intersections and collaborations would begin: (a) computer science research (software tools, C language, UNIX, etc.) at Bell Laboratories (Becker:1994) and elsewhere would combine forces with (b) developments in data analysis (EDA, psychometrics, etc.) and (c) display and input technology (pen plotters, graphic terminals, digitizer tablets, the mouse, etc.). These developments would provide new paradigms, languages and software packages for expressing and implementing statistical and data graphics. In turn, they would lead to an explosive growth in new visualization methods and techniques.
Other themes begin to emerge, mostly as initial suggestions: (a) various visual representations of multivariate data; (b) animations of a statistical process (c) perceptually-based theory (or just informed ideas) related to how graphic attributes and relations might be rendered to better convey the data to the eyes.
Creation of Fortran, the Formula Translation language for the IBM 704 computer. This was the first high-level language for computing.
References:
The "Phillips Curve,'' a scatterplot of inflation vs. unemployment over time shows a strong inverse relation, leading to important developments in macroeconomic theory
References:
Phillips:1958Initial development of geographic information systems, combining spatially-referenced data, spatial models and map-based visualization. Example: Harvard Laboratory for Computer Graphics (and Spatial Analysis) develops SYMAP, producing isoline, choropleth and proximal maps on a line printer
References:
Chrisman:1988 Abbott:1884Beginnings of modern dynamic statistical graphics (a 1 minute movie of the iterative process of finding a multidimensional scaling solution)
References:
Beginnings of EDA: improvements on histogram in analysis of counts, tail values (hanging rootogram)
References:
Tukey:1965Triangular glyphs to represent simultaneously four variables, using sides and orientation
References:
PickettWhite:1966Comprehensive theory of graphical symbols and modes of graphics representation
Among other things, Bertin introduced the idea of reordering qualitative variables in graphical displaysto make relations more apparent--- the reorderable matrix.
References:
Bertin:1967 Bertin:1983Graphical innovations for exploratory data analysis (stem-and-leaf, graphical lists, box-and-whisker plots, two-way and extended-fit plots, hanging and suspended rootograms)
References:
Tukey:1972The first well-known direct manipulation interactive system in statistics: allowed users to interactively control a power transformation in realtime for probability plotting
References:
Fowlkes:1969Irregular polygon ("star plot'') to represent multivariate data (with vertices at equally spaced intervals, distance from center proportional to the value of a variable) [but see Georg von Mayr in 1877 cite[S. 78]{vonMayr:1877} for first use]
References:
Siegel-etal:1971Proposal to use statistical graphics in social indicator reporting, particularly on television
References:
Biderman:1971Development of the biplot, a method for visualizing both the observations and variables in a multivariate data set in a single display. Observations are typically represented by points, variables by vectors, such that the position of a point along a vector represents the data value
References:
Gabriel:1971USA Government chartbook devoted exclusively to reporting social indicator statistics
References:
PresidentBudget:1973Revival of statistical graphics innovation, use by U.S. Bureau of the Census
References:
Color-coded bivariate matrix to represent two intervally measured variables in a single map (Urban Atlas series)[but see Georg von Mayr in 1874 cite[Fig. XIX]{vonMayr:1874} for first use]
References:
USCensus:1974Comparative experimental test of histogram, hanging histogram and hanging rootogram
References:
Wainer:1974Start of true interactive graphics in statistics; PRIM-9, the first system in statistics with 3-D data rotations provided dynamic tools for projecting, rotating, isolating and masking multidimensional data in up to nine dimensions