The SAS statements below are contained in the file pontque.sas.
options fmtsearch=(slid); /* lpvalue permanent format lib */Then, we create a new dataset called PONTQUE which means the combination of Ontario and Quebec. The dataset is established by stacking PONTARIO and PQUEBEC together. The variable
REGRE25C
distinguishes the two provinces.
/* Create a merged dataset includes both Ontario and Quebec samples */ data pontque; set slid.pontario slid.pquebec; run;
/* Choose variables for the merged sample */ data pontque; set pontque (keep= pupid26c /* Random person ID 1994 */ motn2g15 /* Mother tongue group 2 */ immst15 /* immigrant */ yrxft11c /* Years of work experience 1994 */ eage26c /* Ext person's age 1994 */ sex21 /* Sex */ regre25c /* Region 1994 */ yrsch18c /* Total yrs of schooling 1994 */ ttwgs28c /* Wages and slalaries all job 1994 */ ); id=put(pupid26c,8.); run;
The categorical variables are sex (SEX21), mother tongue (MOTN2G15), region (REGRE25C) and immigrant (IMMST15). SEX21, REGRE25C and IMMST15 are dichotomous whereas MOTN2G15 has 3 levels (English, French and other). In this exercise, we will only compare the Ontario and Quebec data, therefore, our classification variable is REGRE25C. In subsequent exercises, you can try SEX21, IMMST15 and MOTN2G15 yourself.
/* Create a title for subsequent output */ Title 'SLID: Working Experience 1994 (Ontario vs. Quebec)'; /* Look at the contents of the sample */ proc contents data=pontque; run; /* Investigate the mean scores of quantitative variables */ proc means data=pontque n min max mean std skew maxdec=3; var eage26c ttwgs28c yrxft11c yrsch18c; run;
Now, take a look at the Chi-sq analysis of immigrant status and their self-report of mother tongue group, for each province. Are there any difference between the two provinces?
/* Chi-sq for immigrant status vs. mother tongue groups, by region */ proc freq data=pontque; tables regre25c * immst15*motn2g15 / chisq nopercent; run;
/* Look at the univariate distribution of the numerical variables */ proc univariate data=pontque normal plot; var eage26c ttwgs28c yrxft11c yrsch18c; run;
The DATACHK macro compares the quantitative variables side by side in boxplots and you can compares the distributions.
/* Check for data normality */ %datachk(data=pontque, var=eage26c ttwgs28c yrxft11c yrsch18c, ls=90); run;
The SPLOT macro compares the quantitative variables in boxplots for the two levels in REGRE25C, namely Ontario and Quebec. Is there any difference between the two provinces?
/* Compare the quantitative variables with REGRE25C */ %splot(data=pontque, var=ttwgs28c yrxft11c yrsch18c, class=regre25c); run;
/* Find suitable powers for transformation */ %symbox(data=pontque, var=ttwgs28c, powers=0 0.5 1 1.5 2); %symbox(data=pontque, var=yrxft11c, powers=0 0.5 1 1.5 2); run;
After choosing the power, you have to create a new dataset which includes the transformed as well as untransformed variables. We rename the dataset as PONTQUE2. There are 12 variables in this new dataset, the original 10 variables plus the two transformed ones.
/* Create new data set with variables transformed */ data pontque2; set pontque; sqrtwex = sqrt(yrxft11c); sqrtwgs = sqrt(ttwgs28c); label sqrtwex = 'sqrt(Working experience 94)'; label sqrtwgs = 'sqrt(Wages and salaries 94)'; run;
/* Examine the relationship between 2 variables, note that original data set is used */ %lowess(data=pontque, y=ttwgs28c, x=yrxft11c, hsym=0.5, interp=r1); %lowess(data=pontque, y=ttwgs28c, x=yrsch18c, hsym=0.5, interp=r1); %lowess(data=pontque, y=ttwgs28c, x=eage26c, hsym=0.5, interp=r1); run;
/* Examine differences between two groups with dichotomous variables */ proc ttest data=pontque2; class regre25c; var eage26c sqrtwgs sqrtwex yrsch18c; run; /* Fit a model */ proc glm data=pontque2; class regre25c; model sqrtwex = eage26c|eage26c regre25c sqrtwgs|sqrtwgs yrsch18c|yrsch18c / solution; output out=pontque3 predicted=predict1 residuals=resid1; run;
/* Look at correlation between quantitative variables for Ontario and Quebec separately */ proc sort data=pontque2; by regre25c; proc corr data=pontque2; by regre25c; var eage26c sqrtwex sqrtwgs yrsch18c; run;