Guidelines for Calculating Sample Size in 2x2 Crossover Trials : a Simulation Study

In crossover trials, patients receive two or more treatments in a random order in different periods. The sample size determination is often an important step in planning a crossover study. This paper concerns sample size calculations in 2x2 crossover trials, with random patient effects and no interaction between the treatment and the patient under two scenarios, namely the exact and the large sample size approaches. Simulation was carried out for determining the sample size for both scenarios. For varying parameter values, simulation was used for generating samples of the required size and examining whether the significance level and power of the tests are maintained. The results indicate that when the sample size was ≤ 5, neither method maintained error rates and when the sample size was >5 and < 12 only the exact approach maintained error rates. However, when the sample size is approximately > 12 both methods maintained error rates. In addition it was found that a saving in sample size can be achieved depending on the extent of the correlation between the observations on the same patient. The simulation results indicate that crossover studies should not be conducted when the anticipated sample size is ≤ 5 and when a sample size of >5 and < 12 is anticipated, the exact method of determining sample size should be used. When larger sample sizes are anticipated either method can be used but the method based on large sample size approximation is simpler.


INTRODUCTION
Crossover trials are clinical trials in which patients are given all the medications to be studied in a random order.According to Grizzle(1965) these studies are generally conducted on patients with chronic diseases to control their symptoms.The data are analyzed according to the original intention to treat.
Ideally, clinical trials should be large enough to reliably detect the smallest possible difference in the primary outcome with treatments that are considered clinically worthwhile.According to Lee et al. (2005), it is not uncommon for studies to be underpowered, failing to detect even large treatment effects because of inadequate sample size.It is considered unethical to recruit patients for a study that does not have a large enough sample size for the trial to deliver meaningful information on the tested intervention.Thus, sample size should be based on scientific considerations.Several approaches are discussed (Pocock, 1983;Julious & Patterson, 2004) for calculating sample size including the power approach and the confidence interval approach.According to previous studies (Chow et al. 2003;Woodward, 1992), these approaches require the specification of several parameters such as between treatment and within treatment variances for the treatments under consideration, the correlation within patients and the reference improvement, which is required to be detected.Chow et al. (2003) explain the two different approaches for 2x2 crossover designs but do not give guidelines on when to use specific approaches.In this study simulation is extensively used to examine the problem of setting guidelines.
In this study, two situations are considered in the calculation of sample size of a crossover study as explained in Chow et al. (2003).These are, (i) The exact approach (ii) The large sample ( approximate ) approach.
Further, the study gives guidelines for when to use the exact approach and the large sample approach and to study how much saving in sample size can be achieved when observations on the same patient are correlated.

Mixed model used:
In clinical trials, it is common to assume that the patients respond consistently to treatments.However, the assumption is invalid if the patients vary randomly in their responses to the drug.For this type of situation, a random subject effects model where the subject effect is considered to be random and the treatment and period effects are considered to be fixed has to be considered (Brown & Prescott, 2006).Chow et al. (2003) have explained how to calculate the sample size in a crossover design using either of the two approaches, namely the exact and the approximate.In this paper a similar 2 × 2 crossover design comparing mean responses for two groups is considered.
In the first approach the test statistic is based on the Student's t distribution, whereas in the second approach the test statistic is based on the normal distribution.In the exact approach, the sample size depends on the degrees of freedom.The calculation of sample size is therefore not straightforward.The same calculation can be done without difficulty if the approximate approach, which is based on the normal approximation, is used.Values of the inverse t distribution function need to be determined for calculating the sample sizes for the exact approach.This is done by using the approximation given in Cooke et al. (1982).The criteria used for determining the method to be used for calculating sample size is based on which method maintains power and significance level.
Let ijk Y be the response observed from the j th (j= 1,2,..,n) subject in the i th sequence (i = 1,2) under k th treatment (k=1,2).The model considered is , where µ is the overall mean, k t is the k th treatment effect, i p is the i th sequence (period) effect , ijk s is the random effect of the j th subject in the i th sequence under k th treatment and ijk e is the error term corresponding to the j th subject in the i th sequence under k th treatment.
The following mixed model is used.Here, treatment effect and period effect are considered as fixed effects and subject effect as random.In this study equal allocation of patients to treatment groups are assumed and no replication is considered.

Then define the following notation.
Since µ is a constant, we can take Here it is assumed that there is no treatment by period interaction, since a simple hypothesis test can be used only under this assumption.The subject effects S ij1 , S ij2 are assumed to be independent and identically distributed as bivariate normal random variables with mean 0 and covariance matrix where σ 2 BT is the variance between patients for the 'treated' group, σ 2 BR is the variance between patients for the 'reference' group and ρ is the correlation between subjects in the treated and reference groups.
So, S ij1 and S ij2 have a bivariate normal distribution with mean 0 and variance -covariance matrix ∑ .It is assumed that the errors e ij1 and e ij2 are such that where, σ 2 WT is the within patient variation for the treated group and σ 2 WR is the within patient variation for the reference group.
Consider a group, which gets treatment 1 in the first period and treatment 2 in the second period, then the model can be written as follows, Let ε be the measure of treatment difference, then n 0 An unbiased estimate for ε is given by, (a) Description: In order to satisfy the above mentioned objectives, a simulation study was carried out.For the exact approach, the bisection method was used as the root finding technique for determining the sample size, as described in Press et al.(2002).The simulation study was also used for determining whether the type 1 error and the power are maintained, for both approaches.Sample sizes were determined for varying correlations for both approaches, and thereby the saving in sample size with increasing correlation was studied.Finally, based on the results of the simulation study carried out, guidelines are provided for sample size calculation in crossover trials.
A C programme was written for performing the simulation study.The C language was selected since it is efficient in doing large scale simulations.The first step of the simulation study was to set some practically plausible values for the parameters required (Sooriyarachchi & Whitehead, 1998 ;Whitehead et al., 2008).Usually crossover trials are associated with a small sample size, due to comparison of treatments being within patient rather than between patient, and variances within patient being usually smaller than the between variances.The between treatment standard deviation for the treated group (σ BT ) was examined over two values namely, 3 and 4 and the between treatment standard deviation for the reference group (σ BR ) was set equal to σ BT , which is often assumed in crossover trials.Two values were assigned for the within patient standard deviation for the treated group (σ WT ) namely, 0.3 and 0.5 and again the within patient standard deviation for the reference group (σ WR ) set equal to σ WT .
The within subject correlation coefficient was indicated by the variable ρ.The values of ρ were examined over 0, 0.3, 0.6, and 0.9.i.e. considering there is no correlation at all, some correlation, high correlation and very high correlation, respectively so that we can see and compare the outcomes for various situations.Note that although it was not considered in this study it is also possible to consider situations where σ BT does not equal σ BR and σ WT does not equal σ WR .The reference improvement is indicated by the variable named ε R .The values of ε R that were examined are 1.5, 2 and 3.
For each of these combinations 1000 simulations were carried out under the null and the alternative hypotheses.Under the null hypothesis, the mean difference between treatments (ε) is set to zero and under the alternative calculation at the design stage.Thus it is required to find an unbiased estimate for use in the test statistic at the analysis stage.An unbiased estimate for σ 2 m can be given by, follows a t distribution with 2n-2 degrees of freedom (Chow et al., 2003).
The null hypothesis is rejected at α level of significance if ...( 6) The above mentioned hypothesis test will satisfy a power of 1 ................( 7) From equations ( 6) and ( 7) the corresponding sample size can be obtained by [Here ( ) indicates the a th ordinate of the t distribution, with n degrees of freedom] When considering the large sample approach, instead of the t distribution, the standard normal distribution is used.Then the formulae for the sample size calculation can be obtained as, ( ) hypothesis, the mean difference between treatments (ε) As explained in the introduction, calculation of sample size is not straightforward for the exact case as the sample size is dependent on the degrees of freedom in this case and thus the sample size determination requires solving of a nonlinear equation in n; hence a root finding technique is needed.The method used here is the Bisection Method explained in Press et al. (2002).
After obtaining an estimate for the sample size, it was of interest to determine the proportion of rejections under the null and alternative hypotheses out of thousand simulations to see whether the power and the significance level are maintained.That is to simulate each sample size 1000 times and get the proportion of rejections.In order to do that, we need to simulate the model explained in the introduction.For that we need to generate s ijk 's and e ijk 's for each sample size.
(b) Random number generation: In simulating uncorrelated variables, Box-Muller transformation was used (Golder & Settle, 1976) and the method described in Al-Subaihi ( 2004) is used for simulating correlated variables.

RESULTS AND DISCUSSION
Checking whether the significance level and the power of the test are maintained was a major objective of this study.In order to do that, two probability intervals were calculated (Sooriyarachchi & Whitehead, 1998) for the true values of significance level and power.The 95% probability interval for a significance level of size α can be obtained by,

(
) The 95% probability interval for a power of 90% can be obtained by, An estimate for the significance level can be obtained by the proportion of rejections of the null hypothesis when the null hypothesis is true, and an estimate for the power of the test can be obtained by the proportion of rejections of the null hypothesis when the alternative hypothesis is true.If the corresponding proportions are within the above probability intervals, it can be concluded that the significance level/power is well maintained by the test.Table 1 gives the proportion of rejections of the null hypothesis under the null and alternative hypothesis for the exact approach and Table 2 corresponds to the similar table for the large sample approach.The proportions which are out of the confidence limits are highlighted in the tables.
In these tables the values taken by all the nuisance parameters ( σ and hence the unbiasedness of 2 ˆm σ . When considering Table 1 corresponding to the exact method, based on the t statistic, it can be observed that usually when the sample size is very small (less than or equal to five), the estimated power is outside the probability limits and higher than the upper limit.This is because for very small sample sizes the approximation to the inverse of the t distribution, which is described in Cooke et al. (1982), is an overestimate resulting in a too large sample size.But when the sample size is somewhat larger (greater than 5) the power is generally well maintained.Except for a very few cases (row number 34), the significance level is usually maintained by the test.
When considering Table 2 corresponding to the large sample approximation method based on the z statistic, it can be observed that when the sample size is less than or equal to 10 in most of the cases the test is under-powered, it is worse when the sample size gets smaller.But when the sample size is equal to 11 in row numbers 6 and 27, the power is well maintained, but in row number 12 the test is under -powered.This is because the normal ordinate is an underestimate of the t ordinate for sample sizes up to about 20 (2ss-2=20 implying ss=11).It can be said that when the sample size is approximately less than 12, the test is under -powered when the large sample approximation is used.When the sample size is approximately greater than or equal to 12 the power is generally well maintained.Except for a few cases (row numbers 14, 64) for almost all the cases the significance level is well maintained.Tables 1 and 2 show that for most of the combinations, values of the In order to illustrate the results more clearly, several graphs have been plotted in addition to the two tables.Figure 1 is drawn to illustrate the variation of sample size with respect to del for different combinations of ρ, σ BT , σ BW and σ WT for the exact approach.
Figure 1 shows how the values of nuisance parameters and the reference improvement effect the calculation of sample size for the Exact method.It can be seen that when ρ increases, the sample size required rapidly decreases, irrespective of the situation.Here ρ represents the within patient correlation coefficient.The higher the correlation between patients, the higher the gain in sample size.Also it can be observed that, the within patient variance (σ WT ) 2 has less effect than the between patient variance (σ BT ) 2 on the calculation of sample size because the latter is usually much greater than the former.When the reference improvement increases the sample size becomes smaller, because the difference we want to detect is larger.Also the gain in sample size due to increasing correlation is higher for smaller reference improvement.
Figure 2 is drawn in order to see whether the significance level is maintained by the test for the exact approach.It shows a graph of the proportion of rejections of the null hypothesis, when the null hypothesis is true versus del for different combinations of ρ, σ BT and σ WT for the exact approach.
Two coloured lines represent the band within which the proportion should lie, in order to maintain the significance level.The corresponding sample size is shown near the points, which are out of the bands.Similar results as shown by Table 1 are illustrated here.
Figure 3 is drawn in order to see how well the power is maintained by the test for the exact approach.It gives a graph of the proportion of rejections of the null hypothesis, when the alternative hypothesis is true versus del for different combinations of ρ, σ BT and σ WT for the exact approach.
When considering Figure 3 it can be seen that most of the sample sizes which lie outside the bands are very small numbers except for 12, which is very close to the upper limit.When the sample size is five or less, many points lie outside the band.The reason is the imprecision of the approximation used in calculating the inverse t distribution values when the sample size is less than or equal to five.When ρ is very high ( 0.9) and the reference improvement is large ( 3 ), there is a higher tendency in obtaining a small sample size, hence a higher number of points can be observed outside the bands in those combinations.
From Figures 2 and 3, it can be seen that similar results are obtained as per the table for the exact approach.
Figures 4 and 5 are the corresponding graphs to Figures 1 and 2, for the large sample approach respectively.The conclusions drawn from Figure 4 are same as those drawn from Figure 1. Figure 5 illustrates similar results as given in Table 2.
Figure 6 is the corresponding graph to Figure 3, for the large sample approach and illustrates similar results as in Table 2.When ρ is very high (0.9) and the reference improvement is large (3), there is a higher tendency in obtaining a small sample size.

CONCLUSION
This study deals with the sample size calculation of crossover trials under two situations in the power approach, namely the exact and large sample methods (Chow et al. 2003).
From the results of the simulation study the following guidelines can be given.It was seen that when the sample size is very small (less than or equal to five), neither method maintains error rates.i.e. even the exact method which was based on the t distribution failed for very small sample sizes.This is because the approximation used to calculate the inverse of the t distribution which is described in Cooke et al. (1982), in determining the sample size, is not accurate for very small sample sizes.Also it is evident that when the sample size is fairly large in terms of crossover studies (5 < sample size < 12), only the exact approach has maintained error rates.This is because the normal ordinate is an underestimate of the t ordinate for sample sizes within that range.That means within the specified sample limits the exact approach should be used.
When the sample size is large in terms of crossover studies ( ≥ 12) both methods have maintained error rates.i.e. for the sample sizes greater than about eleven, the large sample approach, which is much simpler than the exact approach, can be used for sample size determination instead of the exact approach, which needs numerical methods.A higher reduction in sample size can be achieved when the within patient correlation is high.
A better approximation for the inverse distribution of the t distribution should be found when calculating sample sizes, which are believed to be very small.The study is done for 2 x 2 crossover trials, which consider only two treatments and two periods.As an extension one can consider more treatments and periods.Also in this study there are no replications of treatments, i.e. a treatment is given to a set of patients (subjects) only once.One can extend this study to have replications.
An assumption used in this study is that of equality of within and between patient variances for both treated and reference groups (i.e.σ BT 2 = σ BR 2 and σ WT 2 = σ WR 2 ) and equal allocation of patients to both groups.Thus further investigation can be carried out taking different values for these parameters.

σ
reference improvement (ε R ) are given including whether the simulation was done under the null hypothesis (g =1) or under the alternative hypothesis (g =2).Then for each combination, the calculated sample size, the proportion of rejections of the null hypothesis, and the mean value of2  ˆm σ are given.This is useful in deciding how close the mean of 2 ˆm

σ
and the mean of 2 ˆm σ are close to each other for both situations.

Table 1 :
Proportion of rejections of the null hypotheses under the null and the alternative hypotheses for the exact method

Table 2 :
Proportion of rejections of the null hypotheses under the null and the alternative hypotheses for the large sample approximation