I am simulating data for a 2-groups trial with 100 subjects in each group. There are four variables: GROUP, SUBJNO, SEX, AGE. SEX and AGE are covariates for following model.
Here is my code:
%let seed=12345;
data tab1;
call streaminit(&seed.);
length subjno $4 sex $1;
do _n_=1 to 100;
group=1;
subjno='1'||put(_n_,z3.);
sex=choosec(rand('table',0.5,0.5),'M','F');
age=rand('integer',18,70);
output;
end;
do _n_=1 to 100;
group=2;
subjno='2'||put(_n_,z3.);
sex=choosec(rand('table',0.5,0.5),'M','F');
age=rand('integer',18,70);
output;
end;
run;
Unfortunately, The initial random seed, 12345, causes SEX statistical difference on different group, the p value of Chisq is 0.0477.
I have tried new seed value like 123, 1234, 123456, 1234567 and they will not cause SEX statistical difference on different group.
I know there is a possibility that statistical difference of covariates happens. Is there a way to ensure no statistical differences between groups in covariates when simulating data?
Maybe block randomization with covariates as block factor? What about continous covariate variable like AGE?
... View more