SAS EDA Tools
The main link to the SAS macro programs is:
A subset of the most useful of these for EDA is available in a zip
file, edatools.zip (19K)
Most of the macro programs have external documentation and examples
available on the
web. A few, marked '[internal documentation only]' have this information in the form of descriptive
comments at the beginning.
Univariate
-
boxplot macro
- Produces standard and notched boxplots
for a single response variable with one or more grouping variables.
-
datachk macro
- The datachk macro performs basic data screening/checking
on numeric variables in a dataset, and is designed to give a compact
overview of many variables.
-
nqplot macro
- Produces theoretical normal quantile-quantile
(Q-Q) plots for single variable.
Options provide a classical (mu, sigma) or
robust (median, IQR) comparison line, standard error envelope,
and a detrended plot.
-
splot macro
- Draws low-res (printer) schematic plots (boxplots) for one or more variables.
-
symbox macro
- Displays boxplots of a single variable raised to various powers in side-by-side boxplots as
an aid to finding a power transformation to symmetry.
-
symplot macro
- Produces a variety of diagnostic
plots for assessing symmetry of a data distribution and
finding a power transformation to make the data more symmetric.
Bivariate
-
contour macro
- Plots a bivariate scatterplot with a bivariate
data ellipse for one or more groups with one or more confidence
coefficients.
-
lowess macro
- Performs robust, locally weighted scatterplot smoothing
(Cleveland, 1979).
-
resline macro
- Fits a resistant line to X-Y data and determines transformations to make the relation linear.
-
sunplot macro
- Sunflower plot for X-Y data.
The sunflower plot displays a bivariate dataset using "sunflower
symbols" to show the number of observations in the neighborhood
of each XY point.
Multivariate
-
coplot macro
- Constructs a conditioning plot - plots of Y * X | Z,
showing how the relationship between X and Y depends on Z.
-
corrgram macro
- Draws a corrgram -- a schematic plot of a correlation matrix.
Variables are permuted so that
``similar'' variables are positioned adjacently, and cells of a
matrix are shaded or filled to show the correlation value.
-
cqplot macro
- The cqplot macro produces quantile-quantile comparison plots for
multivariate normal data (based on squared Mahalanobis distances
from the centroid) or for other data which
should follow a Chi-square distribution, together with
estimated confidence bands.
-
outlier macro
- Detects multivariate outliers.
The OUTLIER macro calculates
robust Mahalanobis distances by iterative multivariate trimming
(Gnanadesikan & Kettenring, 1972; Gnanadesikan, 1977),
and produces a chisquare Q-Q plot.
- scatmat macro
- Draws a scatterplot matrix for all pairs of
variables.
A classification variable may be used to assign the plotting symbol
and/or color of each point.
GLMs
-
boxcox macro
- Finds power transformations of the response
variable in a regression model (PROC REG) by the Box-Cox method,
with graphic display of the maximum likelihood
solution, t-values for model effects, and the
influence of observations on choice of power.
-
boxglm macro
- Finds power transformations of the response
variable in a general linear model (PROC GLM) by the Box-Cox method.
-
boxtid macro
- Finds power transformations of predictor
variables in a general linear model (PROC GLM) by the Box-Tidwell method.
-
inflogis macro
- Produces an influence plot for a logistic
regression model. The plot shows a measure of
badness of fit for a given case (DIFDEV or DIFCHISQ)
vs. the fitted probability (PRED) or leverage (HAT),
using an influence measure (C or CBAR) as the size of
a bubble.
-
inflplot macro
- Produces an influence plot for a regression model
-- a plot of studentized residuals vs. leverage
(hat-value), using COOK's D or DFFITS as the size of
a bubble symbol.
- meanplot macro
- The meanplot macro produces 1-way, 2-way, or 3-way plots of means for
a factorial design with any number of factor variables.
-
partial macro
- Produces partial regression residual plots.
Observations with high leverage and/or large studentized
residuals can be individually labeled.
-
robust macro
- Robust fitting for linear models (PROC REG, PROC GLM. PROC LOGISTIC)
via iterative re-weighting.