SAS EDA Tools

The main link to the SAS macro programs is:

A subset of the most useful of these for EDA is available in a zip file, edatools.zip (19K)

Most of the macro programs have external documentation and examples available on the web. A few, marked '[internal documentation only]' have this information in the form of descriptive comments at the beginning.

Univariate

boxplot macro
Produces standard and notched boxplots for a single response variable with one or more grouping variables.
datachk macro
The datachk macro performs basic data screening/checking on numeric variables in a dataset, and is designed to give a compact overview of many variables.
nqplot macro
Produces theoretical normal quantile-quantile (Q-Q) plots for single variable. Options provide a classical (mu, sigma) or robust (median, IQR) comparison line, standard error envelope, and a detrended plot.
splot macro
Draws low-res (printer) schematic plots (boxplots) for one or more variables.
symbox macro
Displays boxplots of a single variable raised to various powers in side-by-side boxplots as an aid to finding a power transformation to symmetry.
symplot macro
Produces a variety of diagnostic plots for assessing symmetry of a data distribution and finding a power transformation to make the data more symmetric.

Bivariate

contour macro
Plots a bivariate scatterplot with a bivariate data ellipse for one or more groups with one or more confidence coefficients.
lowess macro
Performs robust, locally weighted scatterplot smoothing (Cleveland, 1979).
resline macro
Fits a resistant line to X-Y data and determines transformations to make the relation linear.
sunplot macro
Sunflower plot for X-Y data. The sunflower plot displays a bivariate dataset using "sunflower symbols" to show the number of observations in the neighborhood of each XY point.

Multivariate

coplot macro
Constructs a conditioning plot - plots of Y * X | Z, showing how the relationship between X and Y depends on Z.
corrgram macro
Draws a corrgram -- a schematic plot of a correlation matrix. Variables are permuted so that ``similar'' variables are positioned adjacently, and cells of a matrix are shaded or filled to show the correlation value.
cqplot macro
The cqplot macro produces quantile-quantile comparison plots for multivariate normal data (based on squared Mahalanobis distances from the centroid) or for other data which should follow a Chi-square distribution, together with estimated confidence bands.
outlier macro
Detects multivariate outliers. The OUTLIER macro calculates robust Mahalanobis distances by iterative multivariate trimming (Gnanadesikan & Kettenring, 1972; Gnanadesikan, 1977), and produces a chisquare Q-Q plot.
scatmat macro
Draws a scatterplot matrix for all pairs of variables. A classification variable may be used to assign the plotting symbol and/or color of each point.

GLMs

boxcox macro
Finds power transformations of the response variable in a regression model (PROC REG) by the Box-Cox method, with graphic display of the maximum likelihood solution, t-values for model effects, and the influence of observations on choice of power.
boxglm macro
Finds power transformations of the response variable in a general linear model (PROC GLM) by the Box-Cox method.
boxtid macro
Finds power transformations of predictor variables in a general linear model (PROC GLM) by the Box-Tidwell method.
inflogis macro
Produces an influence plot for a logistic regression model. The plot shows a measure of badness of fit for a given case (DIFDEV or DIFCHISQ) vs. the fitted probability (PRED) or leverage (HAT), using an influence measure (C or CBAR) as the size of a bubble.
inflplot macro
Produces an influence plot for a regression model -- a plot of studentized residuals vs. leverage (hat-value), using COOK's D or DFFITS as the size of a bubble symbol.
meanplot macro
The meanplot macro produces 1-way, 2-way, or 3-way plots of means for a factorial design with any number of factor variables.
partial macro
Produces partial regression residual plots. Observations with high leverage and/or large studentized residuals can be individually labeled.
robust macro
Robust fitting for linear models (PROC REG, PROC GLM. PROC LOGISTIC) via iterative re-weighting.