Hi Teketo.
1. Yes, in theory you could allocate one treatment per block (district) but you would confound treatment effect with district effect more that way. Also, the same unknown, but finite, risk of a district failure to comply and/or report data has a much bigger impact on final design imbalance if you design that way. I would assume that you would have to have some extremely compelling reason to ask about this. Can you share that with us?
2.(a) I inferr from your question that you are only interested in the sample size impact of an effect difference precision of 0.13 between the control and the two experimental treatments and not between the two experimental treatments. Is that correct? (b) When you ask about a specific intraclass correlation, which class do you mean? block (district), experimental unit (clinic) or is it some class you have not yet mentioned, such as physician within (and sometimes between) clinics (and/or districts)? (c) you ask about simulating a particular coefficient of variation, but the appropriate related statistic for binomial and other distributed count data models such as this the index of dispersion. Is that what you mean? (d) When you ask about simulating the impact of a variance of 0.23, I assume you mean a constraint on the total variance of 0.23. With your design, you have to make separate assumptions about the block and treatment variance, so I guess you want to constrain both of those to add to 0.23. However you and possily your colleagues will be the ones that need to think about how to partition that into block and treatment variance, using the method given in the second paragraph of section 4 of Stroup's 2016 paper as I suggested in an earlier post.
3. There is no effective restriction about random seed size in SAS random number functions. I use 09051958 as it is an easy number to remember for me but you could use any number.
4. I do not see the value in answering this until the questions above have been clarified.
Cheers.
Damien
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 13% or 0.13 diff) can be confidently detected at a given 95% level. **;
** I found this treatment effect size difference of 0.13 between the control treatment and the two **;
** experimental treatments could just be detected at a 95 % C.L with overall sample size of 1265 **;
** by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.31;p2=.44;p3=.50;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 45 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 50 - 60, mean=55. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the n=N to something else. To change the size variation**;
** change the round(M* to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally **;
** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
** nuisance **;
n=45+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit, rounded to an integer **;
mu=round(n*p,1);
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
** 3 clinics chosen at random for the trial. Here, treatments are allocated 2 to a district in **;
** all possible permutation orders, balanced over the 12 blocks (districts). It is vitally important **;
** to vary treatments within blocks. other designs that do not include this principle fail to have **;
** any useful precison **;
cards;
1 1 2 1
2 2 1 2
3 1 3 1
4 3 1 3
5 2 3 2
6 3 2 3
7 1 2 1
8 2 1 2
9 1 3 1
10 3 1 3
11 2 3 2
12 3 2 3
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt / ddfm=contain;
random intercept trt / subject=block ;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below. These sum to 0.23 as desired. **;
parms (0.13)(0.10)/hold=1,2;
** this tests for strong evidence of a difference between control and 2 experimental treatments **;
** at the precision (0.13) and sample size (1805) simulated. The appropriate adjustment is Dunnet-Hsu **;
** not Bonferroni for multiple adjustments when only comparing multiple treatments to a control **;
lsmeans trt/diff cl adjust=dunnett;
run;
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Number of Events | 754 |
Number of Trials | 1805 |
Covariance Parameter Estimates | |||
---|---|---|---|
Cov Parm | Subject | Estimate | Standard Error |
Intercept | block | 0.1300 | . |
trt | block | 0.1000 | . |
Differences of trt Least Squares Means Adjustment for Multiple Comparisons: Dunnett-Hsu |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
trt | _trt | Estimate | Standard Error | DF | t Value | Pr > |t| | Adj P | Alpha | Lower | Upper | Adj Lower | Adj Upper |
2 | 1 | 0.5661 | 0.2201 | 10 | 2.57 | 0.0278 | 0.0495 | 0.05 | 0.07577 | 1.0565 | 0.001399 | 1.1309 |
3 | 1 | 0.8295 | 0.2198 | 10 | 3.77 | 0.0036 | 0.0067 | 0.05 | 0.3398 | 1.3193 | 0.2655 | 1.3935 |
This simulation only just meets your 0.13 precision criteria after the appropriate Dunnett-Hsu corrections for multiple comparisons to a single control are applied assuming a design that allocates only 2 of the 3 treatments to each district of 3 clinics in a balanced design over 12 districts and a total linearised variance of 0.23. Mean adequate clinic sample size is 50. The Bonferroni adjustment to multiple comparisons is not appropriate, Dunnett-Hsu is the one to use when you are only comparing all other treatments to a control. 60 per clinic would give you a safety margin for unforseen problems.
1. the 0.13 and the 0.10 are the split of your desired total variance of 0.23 between the district source of variation and the treatment source of variation. 0.13+0.10=0.23. You have to decide how to split your total variance between those two sources. Again I refer to the original paper and the excellent method that Stroup demonstrates. It is fundamental to the success of your experiment that you realise that there are two sources of variation, only one of which is the treatment factor.
/hold=1,2 tells proc glimmix to hold these two covarience values steady while estimating everything else. If you don't supply proc glimmix with some reasonable guesses for these covariances it cannot do the job of finding a sample size for a given precision.
2. I inferred from your earlier posts that the Bonferroni correction for multiple comparisons is not appropriate because you seem only interested in the differences between the control and the other 2 treatments, in which case the Dunnett adjustment is appropriate. Are you now saying that you are interested in the difference between 2 and 3 as well?
3. Great.
4. I don't know, you would be better placed than I to decide. I simply used the previous estimates you gave for the 3 treatments, which I see you have now changed again. Does this mean you are uncertain about the effectiveness of these treatments? Surely that is an issue of ethical concern for the clinics' patients who will be recuited for the trial. Just make them as realistic as you can. If you think p1=0.30, p2=0.40 and p3=0.50 is the most realistic, or at least as realistic as your last two suggestions, fine, run with that. The smallest difference will be the focus of the decision for sample size for precision, but that should not nudge you towards simulating equal treatment differences if you think another scenario is more likely. What do you really think?
5. (a) Apologies for not stating this before, but ICC and coefficient of variation are outside the scope of this type of simulation. However I doubt that assumptions about those would change the decision about sample size for precision much anyway. They will arise when you analyse the data at the individual patient level. (b) Are you seriously asking me to run those two simulations for you? If you don't understand enough about this method by now, maybe you should study the posts and original paper a bit more carefully and try and get some simulations running yourself. (c) Also, the second request involves deciding on a design that allocates 3 treatments over blocks of two experimental units. Do you have a particular balanced incomplete block design (or one that is nearly so?) in mind that you would like to examine for precision and efficiency? If so, what is it?
is it
blk/eu1/eu2
1 1 2
2 1 3
3 2 1
4 2 3
5 3 1
6 3 2
7 1 2
8 1 3
9 2 1
10 2 3
11 3 1
12 3 2
?
6. No, unless you are interested in all the differences in effect sizes not just those involving the control, as I suggested in 2 above.
simulations of two clinics per district required such large (>>300) clinic sample sizes as to be impractical (implying 10,000 sample overall!) so these are not shown.
Both the Tukey-Kramer and the Bonferroni adjustments are appropriate given the extent of group size differences, but the Tukey-Kramer is more powerful (sensitive), requiring 360 fewer sample size to achieve the desired precision.
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective again as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 10% or 0.10 diff amongst treatments and the tukey-sidak **;
** adjustment for multiple comparisons) can be confidently detected at a given 95% level. **;
** I found this treatment effect size difference of 0.10 could just be detected at a 95 % C.L with a **;
** overall sample size of 2165 by changing these probabilites, setting eu size to 60 and re-running **;
** the exemplar analysis **;
p1=.30;p2=.40;p3=.50;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 55 - 65, mean=60. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 55 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally **;
** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
** nuisance **;
n=55+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=round(n*p,1);
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt / ddfm=contain;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (0.13)(0.10)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
** Bonferroni is an appropriate multiple comparison adjustment when all differences are of interest. **;
** and group sizes are unequal by more than 20% of the geometric mean group size. Also, the **;
** tukey-cramer adjustment is also appropriate and more powerful in this situation. **;
lsmeans trt/diff cl adjust=tukey;
run;
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Number of Events | 869 |
Number of Trials | 2165 |
Covariance Parameter Estimates | |||
---|---|---|---|
Cov Parm | Subject | Estimate | Standard Error |
Intercept | block | 0.1300 | . |
trt | block | 0.1000 | . |
Differences of trt Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
trt | _trt | Estimate | Standard Error | DF | t Value | Pr > |t| | Adj P | Alpha | Lower | Upper | Adj Lower | Adj Upper |
1 | 2 | -0.4292 | 0.1703 | 22 | -2.52 | 0.0195 | 0.0492 | 0.05 | -0.7824 | -0.07592 | -0.8571 | -0.00128 |
1 | 3 | -0.8520 | 0.1698 | 22 | -5.02 | <.0001 | 0.0001 | 0.05 | -1.2040 | -0.4999 | -1.2784 | -0.4255 |
2 | 3 | -0.4228 | 0.1673 | 22 | -2.53 | 0.0192 | 0.0485 | 0.05 | -0.7698 | -0.07577 | -0.8431 | -0.00245 |
This is 360 observations cheaper than the sample size implied by the bonferroni adjustment:
** example precision estimation for a random control trial after **;
** Stroup (2016) **;
** see http://support.sas.com/resources/papers/proceedings16/11663-2016.pdf **;
data rct;
** this is a SAS data step that creates a SAS data set that is called an 'exemplar' **;
** (or representative) data set in the above article to be used in conjunction with **;
** the glimmix procedure below to simulate the impacts of your assumptions **;
** on the precision (and yes, with more work, power) of your experimental design **;
infile cards;
input block @@;
** each block is an independent replication of the 3-treatment **;
** (control + 2 new treatments) group experimental design **;
** in your case if you make each block a randomly selected district **;
** then you get to estimate inter-district response variance for 'free' **;
** the double trailling @ in the input statement holds the observation from being output **;
do eu=1 to 3;
** iterates over each of the 3 experimental units in each (district?) block **;
** each eu (1-3) is a new experimental unit, which is permuted in this example **;
** the double trailing @ holds the observation from output until the end of dataline **;
input trt @@;
** current assumptions for success probabilities of the control (p1) and two **;
** treatments (p2,p3) are set here. These treatments are not as effective as those assumed **;
** in previous simulations. By varying these and the eu size you can see what effect size difference **;
**(here I use 10% or 0.10 diff between control and experimental treatments and the Bonferroni **;
** adjustment for multiple comparisons) can be confidently detected at a given 95% level. **;
** I found this treatment effect size difference of 0.10 could just be detected at a 95 % C.L with a **;
** overall sample size of 2525 by changing these probabilites and eu size and re-running the exemplar analysis **;
p1=.30;p2=.40;p3=.50;
** p takes on the right value given the newly input treatment type. (trt1=1) =1 if trt =1, else = 0 **;
p=(trt=1)*p1+(trt=2)*p2+(trt=3)*p3;
** the ranuni(seed) function generates a uniform random number between 0 and 1 **;
** rounding 10 x this number to an integer and adding to 10 will uniformly randomly **;
** generate (and therefore facilitate simulation of the impact of) experimental unit **;
** sample sizes in the range 65 - 75, mean=70. Using the same seed reproduces the same **;
** psuedo-random sequence of sample sizes every time. This is my 'lucky' number! **;
** to change the cluster size assumption change the 16 to something else. To change the size variation**;
** change the 10 to something else. To use your own 'lucky' random seed change 09051958 to your **;
** own birthday or any other easy to remember number. There is no restriction on the number of digits.**;
** You can leave it blank for a new seed each time, but if you do, you will get a different, but equally **;
** varied set of experimental unit (clinic) samples sizes each time you run it, and sometimes that is a real **;
** nuisance **;
n=65+round(10*ranuni(09051958),1);
** mu is the expected number of positive outcomes from each experimental unit**;
mu=round(n*p,1);
** and the simulated experimental outcome is output to the exemplar data set **;
output;
** and this is done 3 times, one for each experimental unit **;
end;
** in the datalines below I have simulated 12 districts (blocks) chosen at random each with **;
** 3 clinics chosen at random for the trial. The treatments are allocated in all possible permutation **;
** orders, twice over the 12 blocks (districts). It is vitally important to vary treatments within block **;
** other designs that do not include this principle fail to have any useful precison **;
cards;
1 1 2 3
2 1 3 2
3 2 1 3
4 2 3 1
5 3 1 2
6 3 2 1
7 1 2 3
8 1 3 2
9 2 1 3
10 2 3 1
11 3 1 2
12 3 2 1
run;
proc glimmix data=rct;
class trt;
** this model statement form model y/n=x automatically invokes /dist=binomial link=logit **;
model mu/n=trt / ddfm=contain;
random intercept trt / subject=block;
** see the reference article by Stroup 2016 on how to make educated assumptions abut the **;
** group and treatment covariances. /hold tells glimmix not to estimate but hold covariance **;
** parameters to the values given below **;
parms (0.13)(0.10)/hold=1,2;
** this tests for strong evidence of a difference at the precision and sample size simulated **;
** Bonferroni is an appropriate multiple comparison adjustment when all differences are of interest. **;
** and group sizes are unequal by more than 20% of the geometric mean group size. Also, the **;
** tukey-cramer adjustment is also appropriate and more powerful in this situation, but here the **;
** bonferroni adjustment is applied which results in an additional 360 sample size over that required **;
** for the justifiable tukey-kramer adjustment, all other assumption being equal.**;
lsmeans trt/diff cl adjust=bon;
run;
Number of Observations Read | 36 |
---|---|
Number of Observations Used | 36 |
Number of Events | 1013 |
Number of Trials | 2525 |
Covariance Parameter Estimates | |||
---|---|---|---|
Cov Parm | Subject | Estimate | Standard Error |
Intercept | block | 0.1300 | . |
trt | block | 0.1000 | . |
Differences of trt Least Squares Means Adjustment for Multiple Comparisons: Bonferroni |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
trt | _trt | Estimate | Standard Error | DF | t Value | Pr > |t| | Adj P | Alpha | Lower | Upper | Adj Lower | Adj Upper |
1 | 2 | -0.4310 | 0.1651 | 22 | -2.61 | 0.0160 | 0.0479 | 0.05 | -0.7734 | -0.08860 | -0.8588 | -0.00319 |
1 | 3 | -0.8513 | 0.1646 | 22 | -5.17 | <.0001 | 0.0001 | 0.05 | -1.1926 | -0.5100 | -1.2778 | -0.4249 |
2 | 3 | -0.4203 | 0.1624 | 22 | -2.59 | 0.0168 | 0.0504 | 0.05 | -0.7572 | -0.08348 | -0.8412 | 0.000548 |
The number of events = the total number of events or succesess recorded for all treatment types across the study. It is the sum of all the values of mu in the exemplar simulation data set. The number of trials = the total number of times that treatments were delivered across the study. It is the sum of all the values of n in the exemplar simulation data set.No, exactly equal ratios are not assumed, but very nearly so. Because you were initially concerned about uncontrolled variations in clinic sample sizes in your original enquiry, the experimental unit treatment counts in exemplar simulation data set are generated with a uniform random distribution of width 10 ( i.e. +/- 5) about a mean sample size. I assumed you understood this already. So instead of 70,70,70 at each clinic, you get random examples like 66,74,70. However, because of the large number of simulated treatments and the uniform distribution simulated, the overall ratio by treatment type in the exemplar simulation data set will be very close to 1:1:1. Of course when you run your study it may not be quite as close owing to external uncontrollable factors. 70 / treatment//clinic x 12 districts = 840. Instead of 840:840:840, the exemplar ratio is 842:844:839 .
proc means data=WORK.RCT chartype sum vardef=df;
var n mu;
class trt;
run;
trt | N Obs | Variable | Sum |
---|---|---|---|
1 | 12 |
n
mu
|
842.0000000
254.0000000
|
2 | 12 |
n
mu
|
844.0000000
337.0000000
|
3 | 12 |
n
mu
|
839.0000000
422.0000000
|
glimmix, not mixed.
Hi Damien,
Thank you for your unreserved support. Could you give me some support what the commonds and steps look like, I will not do a randomized block design. Due to the nature of the study; that is, the information bias be higher if I allocate all the three treatment units (clincis) in one district. And thus, I will allocate only one treatment unit per district which later on changes the randomization. I will use a completely randomized design rather than block design. What will be the structure and the commond if I use 30, 40 and 50 for each arms and covariance of, let's say 0.5?
Regards
Teketo
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.