|  lags | 
Macro for lag sequential analysis | 
 lags | 
Visualizing Categorical Data: lags
$Version: 1.2 (2 Feb 2001)
Michael Friendly
York University
Macro for lag sequential analysis
Given a variable containing event codes (char or numeric), the LAGS macro
creates:
- 
a dataset containing n+1 lagged variables, _lag0 - _lagN (_lag0 is just a
copy of the input event variable)
 - 
optionally, an (n+1)-way contingency table containing frequencies of all
combinations of events at lag0 -- lagN
 
Either or both of these datasets may be used for subsequent analysis of
sequential dependencies. One or more BY= variables may be specified, in which case separate lags and frequencies are
produced for each value of the BY variables.
A WEIGHT= variable may also be specified,
giving frequencies weighted by that variable.  For example, using the
duration of an event as a weight gives 'frequencies' which represent
the total number of time units for state sequential data.
One event variable must be specified with the VAR= option. All other options have default values. If one or more BY= variables are specified, lags and frequencies are calculated separately for
each combination of values of the BY= variable(s).
The arguments may be listed within parentheses in any order, separated by
commas. For example:
  %lags(data=codes, var=event, nlag=2)
- DATA=
 - 
The name of the SAS dataset to be lagged. If DATA= is not specified, the most recently created data set is used.
 - VAR=
 - 
The name of the event variable to be lagged. The variable may be either
character or numeric.
 - BY=
 - 
The name of one or more BY variables. Lags will be restarted for each level
of the BY variable(s). The BY variables may be character or
numeric.
 - WEIGHT=
 - 
Specifies a numeric variable whose value represents the
         frequency or weight of an observation.  The weight values
         must be non-negative, but need not be integers.
 - VARFMT=
 - 
An optional format for the event VAR= variable. If the codes are numeric, and a format specifying what each
number means is used (e.g., 1='Active' 2='Passive'), the output lag
variables will be given the character values. 
 - NLAG=
 - 
Number of lags to compute. Default = 1.
 - OUTLAG=
 - 
Name of the output dataset containing the lagged variables. This dataset
contains the original variables plus the lagged variables, named according
to the PREFIX= option.
 - PREFIX=
 - 
Prefix for the name of the created lag variables. The default is 
PREFIX=_LAG, so the variables created are named _LAG1, _LAG2, ..., up to
_LAG&nlag. For convenience, a copy of the event variable is created as
_LAG0.
 - FREQOPT=
 - 
Options for the TABLES statement used in PROC FREQ for the frequencies of
each of lag1-lagN vs lag0 (the event variable). The default is 
FREQOPT= NOROW NOCOL NOPERCENT CHISQ.
 
Arguments pertaining to the n-way frequency table:
- OUTFREQ=
 - 
Name of the output dataset containing the n-way frequency table. The table
is not produced if this argument is not specified.
 - COMPLETE=
 - 
NO, or ALL specifies whether the n-way frequency table is to be made
'complete', by filling in 0 frequencies for lag combinations which do not
occur in the data.
 
Assume a series of 16 events have been coded with the 3 codes, a, b, c, for
2 subjects as follows:
 Sub1:   c   a   a   b   a   c   a   c   b   b   a   b   a   a   b   c
 Sub2:   c   c   b   b   a   c   a   c   c   a   c   b   c   b   c   c
and these have been entered as the 2 variables SEQ (subject) and CODE in
the dataset CODES:
        SEQ    CODE
        1      c
        1      a
        1      a
        1      b
        ....
        2      c
        2      c
        2      b
        2      b
        ....
Then the macro call:
   %lags(data=codes, var=code, by=seq, outfreq=freq);
produces the lags dataset _lags_ for NLAG=1 that looks like this:
  SEQ    CODE    _LAG0    _LAG1
   1      c        c         
          a        a        c
          a        a        a
          b        b        a
          a        a        b
          ....
   2      c        c         
          c        c        c
          b        b        c
          b        b        b
          a        a        b
           ....
The output 2-way frequency table (outfreq=freq) looks liks this:
  SEQ    _LAG0    _LAG1    COUNT
   1       a        a        2
           b        a        3
           c        a        2
           a        b        3
           b        b        1
           c        b        1
           a        c        2
           b        c        1
           c        c        0
   2       a        a        0
           b        a        0
           c        a        3
           a        b        1
           b        b        1
           c        b        2
           a        c        2
           b        c        3
           c        c        3
See also
meanplot
panels
scatmat
stat2dat