SAS Macro Programs for Statistical Graphics: BIPLOT
$Version: 1.9 (17 Dec 2003)
Michael Friendly
York University
The BIPLOT macro uses PROC IML to carry out the calculations for
the biplot display described in "Section
8.7".
The program produces
- printed output, giving the singular values, variance accounted for,
and biplot coordinates;
- a labeled PROC GPLOT graph whose axes may
be automatically equated to preserve the geometry of lengths and
angles.
- (optionally) a
printer plot of the observations and variables.
- the
coordinates plotted and the labels for observations are also
returned in two data sets, specified by the parameters OUT= and
ANNO=, respectively, for customized plotting or other uses.
Usage
The original version of this macro required that the columns of the
data table be stored as a set of variables in the input dataset. In
this arrangment, use the VAR= argument to specify this list of variables
and the ID= variable to specify an additional variable whose values
are labels for the rows.
Assume a dataset of reaction times to 4 topics in 3 experimental tasks,
in a SAS dataset like this:
TASK TOPIC1 TOPIC2 TOPIC3 TOPIC4
Easy 2.43 3.12 3.68 4.04
Medium 3.41 3.91 4.07 5.10
Hard 4.21 4.65 5.87 5.69
For this arrangment, the macro would be invoked as follows:
%biplot(var=topic1-topic4, id=task);
The present version also allows the dataset to contain all response
values in a single variable, with two (or more) additional variables
to specify the row and column class variables, as is done with
PROC GLM in the univariate (non-repeated measures) format. In this
case, DO NOT specify an ID= variable, use the VAR= argument to
specify the two row and column class variables, and specify the
name of response variable as RESPONSE=.
The same data in this format would have 12 observations, and look like:
TASK TOPIC RT
Easy 1 2.43
Easy 2 3.12
Easy 3 3.68
...
Hard 4 5.69
For this arrangment, the macro would be invoked as follows:
%biplot(var=topic task, response=RT);
In this arrangement, the order of the VAR= variables does not matter.
The columns of the two-way table are determined by the variable which
varies most rapidly in the input dataset (TOPIC, in the example).
Parameters
- DATA=_LAST_
- Name of the input data set for the biplot.
- VAR =_NUM_
- Variables for biplot, when the data is in
table form, or list of factor variables, when the data is in
GLM form. The list of variables may use any of the SAS abbreviated
forms for variable lists (e.g.,X1-X n).
- ID=
- Name of a character variable used to label
the rows (observations) in the biplot display.
(Only specify an ID= variable when the data is
in table form.)
- RESPONSE=
- Name of response variable (GLM input form)
- DIM =2
- Number of biplot dimensions. (Only two-dimensional
plots are produced if DIM>2.)
- FACTYPE=SYM
- Biplot factor type: GH, SYM, JK, or COV.
FACTYPE=COV gives the GH scaling, with observation vectors multiplied
by sqrt(N-1), and variable vectors divided by the same factor.
- SCALE=1
- Scale factor for variable vectors. The
coordinates for the variables are multiplied by
this value. Setting SCALE=0 causes the macro
to compute the scale factor to equate the maximum
distance from the origin of the variable and
observation markers.
- POWER=1
- Power to which the data values are transformed
(POWER=0 means log(y)).
- OUT =BIPLOT
- Output data set containing biplot
coordinates.
- ANNO=BIANNO
- Output data set containing Annotate labels.
- STD=MEAN
- Specifies how to standardize the data
matrix before the singular value decomposition
is computed. If STD=NONE, only the grand mean
is subtracted from each value in the data
matrix. This option is typically used when row
and column means are to be represented in the
plot, as in the diagnosis of two-way tables ("Section 7.6.3"). If
STD=MEAN, the mean of each column is
subtracted. This is the default, and assumes
that the variables are measured on
commensurable scales. If STD=STD, the column
means are subtracted and each column is
standardized to unit variance.
- COLORS=BLUE RED
- Colors used for OBS and VARS.
- SYMBOLS=NONE NONE
- Symbols used for OBS and VARS.
Because the points are usually labeled, symbols are often superfluous.
- INTERP=NONE VEC
- Interpolation option used for OBS and VARS.
In addition to the standard
interpolation options provided by the SYMBOL statement, the BIPLOT macro
also understands the option VEC to mean a vector from the origin to the row
or column point. [Default:
INTERP=NONE VEC
,
- LINES=33 20
- Line styles used for OBS and VARS interpolation
options.
- PLOTREQ=DIM2 * DIM1
- Specifies the dimensions to be plotted.
- GPLOT=YES
- Produce a GPLOT plot? If GPLOT=YES, the
two dimensions specified in PLOTREQ= are plotted.
- PPLOT=NO
- Produce printer plot? If PPLOT=YES, the
two dimensions specified in PLOTREQ= are plotted.
- HAXIS=
- The name of an AXIS statement for the horizontal axis.
If neither HAXIS= nor VAXIS= are specified, the program calls the
EQUATE macro to produce AXIS statements in which the axes are equated.
This creates the axis statements AXIS98 and AXIS99, whether or
not a graph is produced.
In this case, you should examine the values used for the
INC=, XEXTRA=, and YEXTRA= parameters.
- VAXIS=
- The name of an AXIS statement for the vertical axis.
- VTOH=2
-
The vertical to horizontal aspect ratio (height of one character divided by
the width of one character) of the printer device, used to equate axes for
a printer plot, when
PPLOT=YES
. [Default: VTOH=2
]
- INC=0.5 0.5
-
X, Y axis tick increments (for the EQUATE macro). Ignored if HAXIS= and VAXIS= are specified. [Default:
INC=0.5 0.5
]
- XEXTRA=0 0
-
The number of extra X axis tick marks at left and right. Use to allow extra
space for labels. [Default:
XEXTRA=0 0
]
- YEXTRA=0 0
-
The number of extra Y axis tick marks [Default:
YEXTRA=0 0
]
- M0=0.5
-
Length of origin marker, in data units. If the axes have been properly equated, the lengths of the horizontal and vertical segments should be
equal.
[Default:
M0=0.05
]
- DIMLAB=
-
Prefix for dimension labels [Default:
DIMLAB=Dimension
when DIM=2
, otherwise, DIMLAB=Dim
]
- NAME=
-
Name of the graphics catalog entry [Default:
NAME=biplot
]
The OUT= data set
The results from the analysis are saved in the OUT= data set. This
data set contains two character variables (_TYPE_ and _NAME_) which
identify the observations and numeric variables (DIM1, DIM2, ...)
which give the coordinates of each point.
The value of the _TYPE_ variable is 'OBS' for the observations
that contain the coordinates for the rows of the data set, and is
'VAR' for the observations that contain the coordinates for the
columns. The _NAME_ variable contains the value of ID= variable
for the row observations and the variable name for the column
observations in the output data set.
GOPTIONS
The height and font used for point labels may be set using the
GOPTIONS atatement (HTEXT= and FTEXT=) before calling the macro.
Missing data
The program makes no provision for missing values on any of the
variables to be analyzed.
Example
%include data(AUTO) ;
*include macros(biplot); /* or store in autocall library */
title h=1.6 'Biplot of Automobiles data';
data auto;
set auto;
if rep77 ^= . and rep78 ^=.; /* delete missing data */
model = origin || scan(model,1);
goptions htext=1.5; /* set symbol height */
%biplot( data= auto,
var = gratio turn rep77 rep78 price mpg
hroom rseat trunk weight length displa,
id=model, scale=.8,
factype=SYM, std = STD );