One idea is to add an elliptical confidence region ( data ellipse ) around the mean to highlight the joint relationship. With data for several groups this helps show the extent to which the relationship is the same for all groups. (Better than individual regression lines.)
( x - x bar )T S size +2 -1 ( x - x bar ) <= chi 2 2 ( 1 - a ) ' ' . (3)where x bar = ( x bar , y bar ) are the sample means, S ( 2 × 2 ) is the covariance matrix of (x , y ) , and chi 2 2 ( 1 - a ) is the (1 - a ) percentage point of the chi² distribution with two degrees of freedom.
%macro CONTOUR(
       data=_LAST_,              /* input data set    */
       x=,                       /* X variable        */
       y=,                       /* Y variable        */
       group=,                   /* Group variable    */
       pvalue= .5,               /* probability value */
       out=CONTOUR,              /* output data set   */
       colors=RED GREEN BLUE BLACK);
For example, a set of 50% data ellipses for Weight vs. Price of
automobiles grouped by region of origin is produced by these
statements:
%include contour; %contour(data=auto, x=price, y=weight, group=origin);
One set of ideas involves dividing the data into roughly equal-sized strips according to their ordered x-values. For each strip, we can plot:
The plot below shows the lottery numbers assigned to each birthday (Day of Year) in the 1970 US Draft Lottery carried out by the Selective Service to determine the order in which elligible men would be drafted into the US Army. It was supposed to be completely random, and looks that way from the scatter plot
   SCATTER DRAFT
    366  -        **    *      *  2 * **               *
 L       -    ***     * *  **2       ** 2       2*      *   *
 O       - ***  *    * 2    * ** *  *    *2    *  *              *
 T       -***        *    * *     2*        *  **      *     *  *  2*
 T       -     *  2*2**       * * **     *  **  **       **
 E  275  o   *      2   ****  *  *   2  **   *          *    **  *
 R       -*           *2* * *2   * *      *  *     * * *  **
 Y       -  **     *   *   *          *          * *******2     *   *
         -***  ****  *  **2       **   *   *      *      *  *
 N       - 2    *  *   *      **    *   *    * *     2* *  2   *
 U  184     *  * **          * **   *   * 2       ***    **   **
 M       -*    *      2 *                   * *  ** * *  *       2** 2
 B       -   * * 2   *    **   *           *  **2     ** *     3
 E       -  *        **    *    2    * *  *            *      3 **  2 *
 R       -    *     *            * ** ***    *2* *  * *   *    *  *
     92  + *  ****          *       *   *  **           *  * *   *  *2*
         -    *  *       2   * * *   **     *      **    *  *2*     **
         -   *2    *        *   *     ** *    2   2  *       *  * *  *
         -          * *  2    *** *       *  * * *          2 *   *2
         -  *     *        *       **  * **** *   *   * *   *     **  *
      1  -       *           *                     2*     **    * *   *
          -------------------+------------------+-------------------
          1                 123                 244                 366
                                                           DAY OF YEAR
The next slide shows the trace line plot of the medians (M), upper
(H), and lower (L) hinges of the lottery numbers against day of the
year.  It is clear, even with fairly coarse resolution, that the
lottery numbers decrease towards the end of the year, indicating a
defect in the method used to select these "random
numbers".
     342  +                    H     H
 L        -         H          HH   HHH H
 O        -        HH    HHH H HHH HH H  H
 T        -H     H H HHHHHHHHHH  HHH   HHHHH    HH
 T        -H   HHHHH  HHHH   HH    H    HHHHH  HH   H        H
 E   268  + HHHHH H                       H    HH H HH   HHH H
 R        - HHHH    M                       HHH H HHHHHH HHHH H
 Y        -  HH    MMMMMMMM    MM             HH  H   HHHH    HH
          -      MMM  MMMMMMMM MM  MMM        H        H       H  H  H
 N        -M    MMM         MMMM MMMMMM                H     M HHHHHHHHH
 U   194  +MM  M    L        MM   MM   MMM               MMMMM     HH  H
 M        - MMMM   LL                   MMMMM   MM  MMMMMM   MM     H  H
 B        -   M      L   L                M MMMMM MMMM M      MM
 E        -        L LLLL                     MM  MM       L   MMM
 R        -L    LLLL  LLLLL     L                        LLLLL   MMMMMMM
     121  + L  LLLL      L L L LL   LL                   LL LL     MM  M
          - L  L            LLLLL  LLLL L       LL  LL LLL  LL
          - LLLL           LLLLL LLLLL LLL LL  LLL  LLLLLL    LL
          -  LL              LL  LLL   LLLLLL  LL L LL LL     LLLLL  LLL
          -   L                           L  LLL  LLL  L         LLLLL L
      47  +                             L         LL               LL  L
           +-------------------+-------------------+-------------------+
          21                 129                 237                 346
                                                            DAY OF YEAR
Another generally useful technique is robust, locally weighted regression smoothing , often called lowess . The procedure finds a smoothed fitted value, y hat i , for each x i by fitting a weighted regression to the points in the neighborhood of x i . The points closest to x i receive the greatest weight. This is the "locally weighted regression" part of the procedure, and the weights are called neighborhood weights .
The robust part works as follows: Once the fitted values, y hat i , have been found, the residuals, r i = y sub i - y hat i are used to determine a new set of weights ( robustness weights ) so that points which have large residuals are down-weighted, and the locally weighted regression is repeated.
In practice, lowess depends on the choice of a smoothing parameter, f , 0 < f <= 1 , the fraction of the data points to be considered in the calculation of y hat i . Choosing f = .5 means that only the r = [ .5 n ] points closest to x i have non-zero weights. Increasing the value of f makes the fitted curve smoother, decreasing f lets the curve follow the data more closely.
h i = max | x i - x ij |
w ij = W left "[" x ij - x i over h i right "]"where W ( bullet ) is the tricube weight function,
W ( z ) = left "{" lpile 0 above ( 1 - | z | 3 ) 3 lpile for | z | >= 1 above for | z | < 1 right " "
y ij = a i + b i x ij + e ij
delta i = B left "[" e i over 6 Mdn ( | e | ) right "]"where B ( bullet ) is the bisquare weight function,
B ( z ) = left "{" lpile 0 above ( 1 - z 2 ) 2 lpile for | z | >= 1 above for | z | < 1 right " "
The figure below shows the lowess smoothed fit for the draft lottery data. The steady decline in Lottery Number over the last two-thirds of the year is now quite clear. (The plot is over-dramatic, and perhaps misleading, since the vertical scale covers a smaller range than the plots of the raw data and trace lines.)
       SMOOTH ~ DRAFT LOWESS }366
       SCATTER SMOOTH
    213  -             26666663
 L       o    *666666674      366
 O       -46665                  66
 T       -                        *65
 T       -                          *66
 E  189                               66*
 R       -                               563
 Y       -                                 3664
         -                                    366663
 N       -                                         3666*
 U  165  +                                             56*
 M       -                                               55
 B       -                                                26*
 E       -                                                  54
 R       -                                                   26*
    141  +                                                     54
         -                                                      26
         -                                                        62
         -                                                         44
         -                                                          26
    117  -                                                            4
          -------------------+------------------+-------------------
          1                 123                 244                 366
                                                           DAY OF YEAR
The LOWESS macro reads the input data set into PROC IML, calculates the smoothed y hat values, and creates an output data set containing the original and smoothed data. The macro takes the following parameters:
/*---------------------------------------------------------------* * LOWESS SAS Locally weighted robust scatterplot smoothing * *---------------------------------------------------------------* %macro LOWESS( data=_LAST_, /* name of input data set */ out=SMOOTH, /* name of output data set */ x = X, /* name of independent variable */ y = Y, /* name of Y variable to be smoothed */ id=, /* optional row ID variable */ f = .50, /* lowess window width */ p = 1, /* 1=linear fit, 2=quadratic */ iter=2, /* total number of iterations */ plot=NO, /* draw the plot? */ gplot=NO, /* draw the plot? */ pplot=NO, /* draw a printer plot? */ symbol=circle, htext=1.5, hsym=1.5, name=LOWESS); /* name for graphic catalog entry */e.g.,
%include lowess; %lowess( data=auto, x=weight, y=mpg, f=.3, gplot=YES );
y = f ( x ) + residual = fit + residual
All other things equal, we prefer a "simple" f ( x ) like a linear function, to a more complex one. If a scatterplot of y against x appears substantially non-linear, there are two choices:
y'= a + b x'+ residual
Once again, the ladder of powers provides a scheme for understanding the effect of various power transformations. Visual examination of the scatterplot of the raw data, possiblly enhanced with a lowess smoothed curve, can suggest the direction to move on the ladder for transforming x , or y , or both.
As a preliminary example, consider the simple case where
y = size -2 1 over 5 x 2for x = 0, 1, 2, 3, 4, 5. Here, the relation can be straightened by transforming x :
xT= x 2 --> y = size -2 1 over 5 xTor y :
yT= sqrt y --> y = sqrt size -2 1 over 5 xTThe data and the transformed values are:
X Y X} Y.9 0 0 0 0 1 0.2 1 0.4472 2 0.8 4 0.8944 3 1.8 9 1.342 4 3.2 16 1.789 5 5 25 2.236These values are plotted below:
       SCAT X, Y                         SCAT XP, Y
 5-                        f       5-                        f
  -                                 -
  -                                 -
  -                                 -
  -                   f             -               f
  -                                 -
  -              f                  -        f
  -                                 -
  -         f                       -   f
  -                                 -
 0f----f--------------------       0ff------------------------
  0           X            5         0           X'         25
       SCAT X , YP
 2.5-
    -                        f
    -
    -                   f
    -
    -              f
    -         f
    -
    -    f
    -
 0  f-------------------------
    0           X            5
The arrow points in the direction to move along the ladder of powers for x or for y . That is, if the arrow points toward smaller values of x (or y ), move down the ladder of powers, toward sqrt x or log ( x ) (or sqrt y , log ( y ) ). The greater the curvature, the further from 1 (raw data) should be the power.
x --> x p
y --> y q
Since the power transformations are order-preserving, we can try different power transformations on a few well chosen points, to see what effect they would have on all the data.
Tukey (1977) suggests finding three summary points which are the median ( x , y ) values in three equal-sized strips based on the ordered x -values. Use subscripts L , M , H to denote the ( x , y ) values for the Low, Middle, and High strip.
If the relation is linear, these three points will fall on a line, or equivalently, the slope of the line from (x , y ) sub L to (x , y ) M will be the same as the slope of the line from (x , y ) M to (x , y ) sub H .
Y- - - Y- - - H - - - - - - - H - - - - - - M - - - M - - - L - - - L - - - - - - - - 0+------------------------- 0+------------------------- 0 X 0 XOn the other hand, if the relation is not linear, the lines connecting the pairs of points will have different slopes.
This division is modified when equal x -values straddle the dividing line: they are all kept in the same third. Also, neither end portion should cover more than half the range of the x -values, which can happen if the distribution of x is highly skewed.
This section uses data on literacy rates and gross national product for 22 nations to show the details of the calculations. (The plot indicates the relation is highly nonlinear. The arrow rule indicates we should transform x to lower powers or y to higher powers.)
NEPAL 45 5 BURMA 57 47.5 UGANDA 64 27.5 S. VIETNAM 76 17.5 SCAT GNP THAILAND 96 68 100- f f f f HAITI 105 10.5 - f INDONESIA 131 17.5 - f S. KOREA 144 77 L - f GHANA 172 22.5 I - f PERU 179 47.5 T - f f EL SALVADOR 219 39.4 E - f BR.GUIANA 235 74 R - f HONG KONG 272 57.5 A -f f f PANAMA 329 65.7 C - f LEBANON 362 47.5 Y - SINGAPORE 400 50 -f ARGENTINA 490 86.4 -fff ICELAND 572 98.5 - f CZECHOSLOVAK 680 97.5 -f FRANCE 943 96.4 0------------------------------------- NEW ZEALAND 1310 98.5 0 GNP 2000 CANADA 1947 97.5Since the distribution of GNP is so skewed, the range rule suggests allocating only two points to the upper third. The summary points are found to be:
         2 RSTAB GNP
   SUMPTS: SPLIT IS 7 13 2
   SUMMARY POINTS
   L     76 17.5
   M    329 65.7
   H   1629 98
r = m HM over m ML = ( y H - y M ) / ( x H - x M ) over ( y M - y L ) / ( x M - x L ) (4)As shown earlier, a linear relation implies r approx 1 (or log r approx 0 ).
For the GNP data, this ratio is quite far from 1:
      1 1 SLOPES S
SUMMARY    HALF    SLOPE
POINTS     SLOPES  RATIO
[X*1] [Y*1]
1629 98
           0.02485
 329 65.7          0.1304
           0.1905
  76 17.5
Since we are using medians, and the ladder of powers is order-preserving (as long as all values are positive), we can transform the summary points to various powers, and the ratio of slopes for the transformed summary points will tell what effect that transformation will have on the whole data set.
If we transform x --> x p , and y --> y q , then the ratio of slopes for that pair of transformations is
r < (p,q) = m HM < (p,q) over m ML < (p,q) = ( y H q - y M q ) / ( x H p - x M p ) over ( y M q - y L q ) / ( x M p - x L p ) (5)Since we know we must go down the scale of powers for x , we try log x , then - 1 / sqrt x . The first is not far enough, but the second is too far. p = - 1/3 is an in-between power, and seems to work quite well.
      0 1 SLOPES S
SUMMARY     HALF   SLOPE
POINTS      SLOPES RATIO
[X*0] [Y*1]
3.212 98
            46.49
2.517 65.7         0.6138
            75.74
1.881 17.5
      H.5 1 SLOPES S
SUMMARY        HALF   SLOPE
POINTS         SLOPES RATIO
[X*H0.5] [Y*1]
H0.02478 98
               1064
H0.05513 65.7         1.315
                809
H0.1147  17.5
      H.33 1 SLOPES S
SUMMARY        HALF   SLOPE
POINTS         SLOPES RATIO
[X*H0.33] [Y*1]
H0.08711 98
               533.3
H0.1477  65.7         1.016
               524.9
H0.2395  17.5
   RATIOS OF SLOPES  :
      YE  [H2  ] [H1  ] [ H.5] [LOG ] [  .5] [RAW ] [ 2  ]
    X$
   [H2  ]   .778  2.213  3.575  5.590  8.459 12.394 24.385
   [H1  ]   .175   .499   .806  1.261  1.908  2.796  5.500
   [ H.5]   .083   .235   .379   .593   .898  1.315  2.588
   [LOG ]   .039   .110   .177   .277   .419   .614  1.208
   [  .5]   .018   .051   .082   .128   .194   .284   .559
   [RAW ]   .008   .023   .038   .059   .089   .130   .257
   [ 2  ]   .002   .005   .008   .012   .018   .027   .053
The figure below shows the data and scatterplot when GNP is transformed to the reciprocal cube root.
         NEG. CUBE ROOT OF X IS CLOSEST TO BEING STRAIGHT
         GNP3 ~ GNP
         GNP3[;1]~GNP[;1] POWER H.333
         GNP3  TRANSFORMED X
   H0.2815   5
   H0.2602  47.5
   H0.2503  27.5               SCAT GNP3
   H0.2364  17.5         100                         ff  f f   -
   H0.2187  68                                          f      -
   H0.2123  10.5                                    f          -
   H0.1972  17.5         L                 f                   -
   H0.1911  77           I                     f               -
   H0.1801  22.5         T             f          f            -
   H0.1777  47.5         E                      f              -
   H0.1662  39.4         R                         f           -
   H0.1623  74           A        f          f    f            -
   H0.1546  57.5         C                     f               -
   H0.1451  65.7         Y                                     -
   H0.1406  47.5                   f                           -
   H0.136   50                       f    f  f                 -
   H0.1271  86.4                        f                      -
   H0.1207  98.5               f                               -
   H0.114   97.5           0------------------------------------
   H0.1022  96.4            H0.3         GNP*H1 3          H0.05
   H0.09161 98.5
   H0.08029 97.5
The plot shows no signs of nonlinearity.  However, some find
"reciprocal cube root" of GNP hard to interpret, and
might prefer a slightly curved relationship with  log  GNP
.
The plot below shows data generated from a mixture of three bivariate normal distributions. It is hard to see that there are three regions of high density.
       PLOT OF Y*X    LEGEND: A = 1 OBS, B = 2 OBS, ETC.
 Y |                                                  AA  A
   |                                               A    BA
50 +                                          A A   A        A
   |                                      B   A ABACD ABB CAB
   |                                       A   AACBCBAA BA
   |                                    AA B  BAAACACA B
   |                                 A AA B ABAABA A A  A
40 +                                AA  A B   D
   |                            A        AA B
   |                                    A CA  AA
   |                       A   B    BBC BBA  AB A
   |                       A  B BAB E AAAA  AA   A
30 +                    A A    ADAAC  A A   A A
   |                   A  A AB   A BA BAAAA
   |                      A   A AB A A A A A
   |                  A A AAA AA
   |                 A     A  A    AAA
20 +                A AABAAA  A   A  A
   |             AA A     BACBB A
   |        A ABA ABB   DAA BAA B A
   |       AAA  AAB CACAAB  A
   |    B      AA  AF AAB B A
10 +    AA      AB      B
   |          A      B
   |                B
   |
   |
 0 +
   --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+--
    -3   0   3   6   9  12  15  18  21  24  27  30  33  36  39
                                X
To jitter, add a uniformly distributed random quantity, scaled so that it breaks up the overlap, but does not corrupt the data. Let u i = u [ -1, 1 ] be uniformly distributed from -1 to 1. Then, to jitter x , calculate
xTi = x i + s u iwhere s is the scale factor, which might be some small fraction (e.g., .02) of the range of the data, or 1/2 the rounding interval if the data have been rounded. If the y variable is also discrete, the same process can be applied to jitter y --> yT.
Then, in the scatter plot of y (or yT) vs. xT it should be easier to see the density of points in different regions.
![[Previous]](../../icons/larrow.gif) Previous section
 Previous section
![[Next]](../../icons/rarrow.gif) Next section.
 Next section.