Abstract
Background: Although randomized controlled trials are often a gold standard for determining intervention effects, in the area of practice-based research (PBR), there are many situations in which individual randomization is not possible. Alternative approaches to evaluating interventions have received increased attention, particularly those that can retain elements of randomization such that they can be considered “controlled” trials.
Methods: Methodological design elements and practical implementation considerations for two quasi-experimental design approaches that have considerable promise in PBR settings – the stepped-wedge design, and a variant of this design, a wait-list cross-over design, are presented along with a case study from a recent PBR intervention for patients with diabetes.
Results: PBR-relevant design features include: creation of a cohort over time that collects control data but allows all participants (clusters or patients) to receive the intervention; staggered introduction of clusters; multiple data collection points; and one-way cross-over into the intervention arm. Practical considerations include: randomization versus stratification, training run in phases; and extended time period for overall study completion.
Conclusion: Several design features of practice based research studies can be adapted to local circumstances yet retain elements to improve methodological rigor. Studies that utilize these methods, such as the stepped-wedge design and the wait-list cross-over design, can increase the evidence base for controlled studies conducted within the complex environment of PBR.
Although randomized, controlled trials (RCTs) are often a gold standard for determining intervention effects, in the area of practice-based research, quality improvement, and public health, there are many situations where individual randomization is not possible.1 For example, an RCT may not be considered possible for any of the following reasons: (1) there is a commonly held view among stakeholders that evidence is sufficient in some settings of an established intervention benefit, and it would be unethical to have control groups; (2) the intervention delivery is underway already (e.g., a policy change sets in motion new clinical procedures); or (3) assignment to a control group is unacceptable to some groups that would potentially be controls. Consequently, alternative approaches to evaluating clinical and community interventions have received increased attention, particularly those that can retain some elements of randomization such that they can be considered “controlled” trials.2–6 Such designs are consistent with discussions of “practical clinical trials”7 in that these designs adapt to local considerations, such as those raised above, and therefore are likely to achieve better outcomes. In light of the recognition that evidence-based practice must be informed by practice-based evidence,8 practice-based research (PBR), in particular may benefit from approaches that do not require randomization of individuals in settings where it may not be feasible. Such alternative designs, often called “quasi-experimental” designs, are increasingly used for the evaluation of clinical and/or practice-based interventions applied under real world circumstances where individual-level RCT designs are not suitable.
This paper reviews design elements and practical implementation considerations for two quasi-experimental designs that have considerable promise in PBR settings: the stepped-wedge design, and a variant of this design, described herein as a wait-list cross-over design, currently used in a large PBRN project. The stepped-wedge design's relevance to PBR lies in: (1) it is a cluster-based design suitable to clinic-level interventions that enables all sites to receive the intervention, yet (2) it avoids some of the methodological pit-falls associated with before and after designs, as it retains controlled data elements.5 The stepped-wedge and wait-list variant allow all patients to receive the intervention while also contributing control time. This differs from a traditional cluster randomized trial done in parallel time, in which some sites are randomized to control at the outset and do not have an opportunity to cross over into an intervention arm.
A variant of the stepped wedge, the wait-list cross-over trial, is also particularly well suited to PBR and quality improvement studies involving clinic or systems-based disease registries, as it enables staggered implementation of patient-level interventions over time. Staggered implementation is highly relevant to counseling-based programs and those requiring considerable staff training, such as some guideline-based interventions, as the design can alleviate staffing burdens (and costs) that would be encumbered without staggered implementation. The staggered implementation creates a waiting period for some patients, and this wait time provides useful “control” information for inclusion in the data analysis.
These designs have stronger methodological rigor in comparison with other well-known quasi-experimental design options (such as pre and post studies), because it is possible to control the roll-out and to include elements of randomization that can reduce biases. However, the best design for a particular intervention must be determined within the local context, through determining what types of evidence exist to support other designs and what are the relative merits of alternatives to RCT in different contexts.1
Methods
We present a review of design features and practical considerations for PBR implementation for the stepped-wedge and wait-list design, along with a discussion of published examples from studies of clinic-based interventions using these designs. A PubMed search was conducted using the terms quasi-experimental, stepped-wedge, quality improvement, and wait-list design to identify methodological as well as clinic-based publications that describe these design approaches. The examples described in this paper are derived from the literature review and also the author's experiences in PBR work conducted within the San Francisco Bay Collaborative Research Network, a PBRN located in the SF Bay Area and affiliated with the University of California San Francisco, with which the authors are long-time members and research collaborators.
Results
Stepped-Wedge Designs
Stepped-wedge designs have been in use for several decades, often in the context of settings in which there are strong objections to individual level randomization.1 A 2006 review of stepped-wedge designs indicated this design was frequently used in developing countries, often in the context of HIV treatment interventions.2 More recently, the literature indicates stepped-wedge designs are increasingly being implemented in clinic-based settings world-wide, for example, in a trial of guideline implementation for therapeutic hypothermia in post-cardiac patients in a network of Emergency Departments in Canada10 and in evaluating implementation of a clinician-based psychosocial intervention for improving treatment of patients with cancer pain in Australia.11 Stepped-wedge designs are a cluster-randomized type of cross-over design in which clusters are all initially assigned to the control group, then switch to the intervention group at randomly assigned time points. All clusters receive the intervention by the last time interval.3 The stepped-wedge design derives its name from the shape of the staggered roll out over time periods of intervention units (clinics or other units for clustering such as schools), which resembles a stepped or stacked blocks (see shaded blocks or steps in Figure 1). The stepped wedge design is particularly useful for evaluating the population impact (or effectiveness) of an intervention that was previously found efficacious in an individually randomized trial.3 Because stepped-wedge designs include a cross-over component, data analysis options are flexible and include between and within-cluster comparisons as well as temporal variations in intervention effects.2–4
Design Features of a Stepped Wedge
(1) Creation of a cohort comprised of control data and intervention data with contributions from each site to both cohorts, such that two samples can be compared. In Figure 1, this would be that outcomes in the white units are compared with those in the shaded units, and both cohorts are created across time periods (as compared with a before and after study that doesn't enable adjustments over time).
(2) Staggered introduction of intervention (clusters) over time. Following an initial baseline data collection period in which all clusters contribute control data, the order in which clusters receive the intervention may be determined at random, or may involve accounting for practical considerations (such as size of clusters), which may necessitate matched of stratified approaches to assigning intervention timing.
(3) Multiple data collection points. As described in (1), each block of data requires data collection, so that control data are collected multiple times for some units. In Figure 1, cluster 5 contributes data at five time points before receiving the intervention.
(4) Every site gets the intervention eventually and every site contributes control data across one or more time periods (with the exception of the first cluster).
(5) Clusters cross over from control to intervention (one-way cross-over).
(6) Ability to control for time trends by allowing contemporaneous comparisons across clusters, at different time periods.
Figure 2 provides a comparison of the cluster assignments over time when using: (a) a parallel time allocation as in a traditional cluster randomized trial in which control clusters do not receive the intervention within the study observation period and there is only one time period allowed; (b) a cross-over trial in which clusters can cross over from intervention to control (two time periods allowed); and (c) a stepped wedge, in which clusters each contribute multiple data points to the cohort and are staggered in the order they receive the intervention, but only cross over into intervention from control (multiple time periods allowed).
Practical Implementation Considerations for Stepped-Wedge Designs in PBR
Randomization of Clusters
If it is possible, random assignment of start times for clusters is preferred. However, there are often circumstances for which this is not feasible or suitable and stratified or matched approaches may be used to reduce the biases associated with non-random assignments (see below). Several recent examples have used stratified approaches to account for:
(1) The varying size of clusters, such that not all the largest clusters (e.g., hospitals or community clinics with large patient volume) would be randomized to intervention at the start of the study, and not all the small ones randomized at the end.13
(2) The intervention was already underway in some sites but not others (e.g., guideline implementation in circumstances where some hospital sites beginning to implement a new protocol, while others have not yet begun).
(3) Logistic implementation factors such as geographical distance from training or referral resources or seasonal variations that are related to the outcome13 may also limit the feasibility of a completely randomized allocation of clusters.
Training or Phasing-In of Intervention-Related Components
Because it is often the case that PBR interventions require clinician or staff training components that must take place before the intervention start date (such as for adopting a new clinic guideline or other practice change initiative), a stepped-wedge design can build in a training-related “run-in” phase that takes place after the control time has been completed and before the intervention period begins, as in Figure 3. In this way, each of the participating clinical sites can be allocated in random order from control condition to training then delivery of the intervention.12 This training period cannot contribute data to the cohorts, and has to be excluded in subsequent analyses, which must be taken into account when developing sample size estimates. An example of a completed clinic-based study that used this stepped-wedge approach with a phased in intervention examined the impact of integration of HIV testing and anti-retroviral therapy (ARV) within prenatal care services in 8 primary care practices in Zambia on rates of ARV therapy before delivery.12 Sites were not randomly allocated but instead were allocated within strata based on volume of patients. The authors used 4 strata levels of numbers of patients per site (1 = least patients, 4 = most patients) using 1, 2, 3, 4, then 4, 3, 2, 1 roll-out. This enabled a slower introduction of the training in use of the ARV therapies and clinic procedures, while enabling some controlling for the non-random allocation using stratified analysis techniques.
Extended Observation Period Associated With Stepped-Wedge Designs
Stepped-wedge designs usually take longer than traditional RCTs or traditional cluster randomized trials. Therefore, it is important that funding and staffing are secured to allow for the extended study length. Additionally, community clinic partners need to be aware before agreeing to participate that the results may take longer to obtain and the impact on the clinic will last longer. With an extended time period, several concerns may arise. Most prominently, there are concerns about bias, that if sites serving for longer in control periods have to provide repeated data then it is possible that the quality of the data may change over time. As well, bias could arise from differential drop-out rates due to delays in receiving the intervention in clinics that have waited longer. In the Killan example above, electronic medical records data were used, so that additional patient interviews were not necessary reducing potential biases associated with multiple data collection that is not part of routine care. While the possibility of differential drop out rates is real within longer studies of this kind, to our knowledge, there have not been any published assessments examining these rates within time periods. Instead, time periods are presented as adjustments in the data analysis of the clustered data.
Repeated Observations Before Intervention Exposure
With the stepped-wedge design, clusters (or in the case of the wait-list design, individual participants) not yet crossed over to intervention are evaluated multiple times, which could introduce a bias. In the case in which clinics are repeatedly evaluated, the same individuals are rarely included in each repeat assessment, and in many cases, the assessments are conducted using existing data, such as medical records, rather than based on interviews. However, interview data may be used as in the case study described below, and multiple observations per person will be included.
Wait-List Design Variant of Stepped Wedge
A stepped-wedge wait-list design is a version of the cluster based stepped-wedge approach described above in which individuals (rather than clusters like clinics or community centers) are randomized from wait list to intervention over a series of time stages. This design variant is well adapted to contexts in which there is a large registry of patients who are eligible for participation in an intervention (such as disease-specific registries or health plan membership rosters), when an RCT is not possible, and when it is not feasible to assign all patients to receive the intervention at the same time.
Case Study: SMART STEPS Project
In our experience in PBR within the SF Bay Collaborative Research Network, a wait-list variant of the stepped-wedge design was selected to evaluate the implementation of a diabetes self-management support program, in which a regional Medi-Caid managed care plan that maintains a registry of diabetes patients has enrolled active members with diabetes into a self-management support diabetes intervention across 4 PBRN clinic sites.14 The development of the project started when the San Francisco Health Plan (SFHP), approached the authors (DS and MH), for help in adapting a health IT diabetes self-management support program, the Automated Telephone Self Management Support (ATSM), which we had previously developed, tested and found efficacious in a RCT within our PBRN.15–21 The intervention was to become a covered benefit to their patients for a trial period, for which the evaluation was conducted (2009 to 2011) with the goals of delivering the intervention to several hundred patients using an implementation design that would enable them to study a variety of outcomes. The SFHP thought that a RCT with a traditional parallel assigned control group would not be ethical because our recent studies had shown the effectiveness of the ATSM Program (the modified ATSM that was implemented was called SMART STEPS) within the same clinic population covered by the health plan. The SFHP decided to provide SMART STEPS as a covered member benefit, with wait-list patients receiving the intervention after 6 months. The University of California San Francisco research team was asked to assist with the evaluation strategy that involved patient-level consent and randomization procedures, for which we received an Agency for HealthCare Research and Quality (AHRQ) PBRN R18 grant.22
As is the case for many regional health plans such as SFHP and clinic systems or PBRNs, it was not feasible to scale-up the ATSM intervention across all clinics and to all eligible patients at once, without incurring huge costs for staffing. This is because SMART STEPS involves a care manager/counselor to call patients with some frequency to solve diabetes self management problems reported by patients or that were identified through review of electronic data, such as not picking up medications or laboratory values, such as elevated hemoglobin A1c. Consequently, rolling out the intervention in a staggered fashion, but controlling the roll-out through randomizing a sizable proportion of the patients to wait-list at each time interval, would create a cohort that has retained design elements of randomization, but that would be practical for staffing purposes (see Figure 4). The staggered intervention implementation meant the health plan could hire only one health coach for the majority of the intervention period, with some additional staffing needed at certain time intervals. The evaluation and data collection involve both individual interviews and electronic medical record-derived outcomes, such as laboratory values, medication uses, and primary care and hospital visits. The SMART STEPS Project is nearing completion with over 350 patients enrolled.
Represented in the figure are 130 patients from the 4 clinics who were included in the diabetes registry who were randomized to intervention and 130 who were randomized to wait list, over the entire study period. Each 6-month enrollment phase (the boxes identified as waves) in this project has patients going both directly into an intervention arm (INT) or going into wait list for 6 months (WL). Each wave of the wait-list patients is then crossed over into intervention after 6 months (WL-INT). The dots represent the cross-over for individuals on wait list into the active SMART STEPS intervention arm (WL-INT). This resulted in several waves of control data as well as intervention data that take into account possible variations over time that may affect the study results in the outcomes analyses.
To conduct the wait-list evaluation, data collection in the form of interviews was done multiple times. For example, participants in both arms received interview 1 at baseline, before getting “activated” to begin SMART STEPS or begin the control wait-list 6-month period. Wait-list patients again received interview 1 just before crossing over into intervention. After completing the intervention, all patients receive the follow up interview.
Practical Implementation Considerations
There are additional challenges in implementing the wait-list design in addition to those described above, but they reflect many of the realities of PBR, and there are often strategies to overcome them.
Changing Eligibility Based on Registry and Health Plan Membership Criteria
One challenge pertains to any study that uses an active enrolment strategy, for example from health plan membership, such that patients' eligibility can change over time, or from a disease-based registry that must be updated to include newly diagnosed patients. For example, in this cohort, participants could switch clinics and become ineligible, they could lose health plan membership and no longer be eligible, or there could be new diabetes patients who would become eligible and need to be added to the eligible patient pool. It was necessary to review the diabetes registry data on a monthly basis and remove some participants from the wait list and intervention arm, when they lost their health plan membership, or became ineligible for other reasons. Although this flux was small and did not affect overall study sample size considerably in this project, it did require active registry surveillance and the development of study criteria to determine if patients who became ineligible had participated for enough time to qualify as ‘exposed' to the intervention or to the wait list.
Differential Attrition From Wait-List Groups
We did not find that patients who had been on the wait list were less likely to participate in the intervention once they were crossed over, but there is a concern that the duration of the wait list could affect subsequent participation. We will be conducting a variety of fidelity assessments to examine potential differential engagement across waves, clinics, and study arms to examine this possibility in more detail.
Statistical Analysis Considerations
Stepped-wedge designs can be analyzed using a variety of techniques for longitudinal outcomes that allow for clustering and covariate effects.3–5 Most statistical software programs such as SAS or STATA can accommodate data analysis strategies relevant to stepped-wedge design, such as mixed effects or generalized estimating equation regression models. A key point is that while there are efficiencies in study power associated with cross-over designs such as the stepped-wedge,3 there are also costs associated with its use that must be evaluated before the design is used. The role of confounding, the impact of extended follow-up time, and the impact of clustering in the data analysis strategy are each discussed below, followed by a summary of sample size estimation for stepped-wedge designs. Detailed treatments of relevant statistical issues are provided by Hussey and Hughes,3 Cousens et al,4 and Li and Frangakis.6
Examining Confounding in Stepped-Wedge Designs
The estimate of overall intervention effect from a stepped-wedge design is based on a comparison of average responses between treatment and intervention groups. The model for cluster-level responses typically includes cluster-specific random components and also allows for separation of intervention and time effects. The latter represent an important potential source of confounding due to the partially randomized nature of this design, and must be accounted for in data analyses. Further the common assumption of no interaction between intervention effects and time needs to be evaluated in analyses.3 If there are no significant time trends detected, then analyses can frequently be simplified to a paired t test comparison of responses between groups. Other potential confounders need to be accounted for in adjusted regression modeling.4 Individual responses can be analyzed using generalizations of these methods, including hierarchical models investigating group and individual-level explanatory covariates.
Implications of Extended Follow-Up Periods in Stepped-Wedge Designs
Sample size planning for stepped-wedge studies should also take into account the extended length of follow-up that may be needed to roll out each of the steps. When it is possible to include several steps and decrease the time intervals for each step, then study power is increased. This extended time for the overall study observation period may reduce overall retention, result in missing data and reduce overall power from the associated decreases in sample size. Although this is a risk in any cohort design, those that prolong the observation period as with the stepped wedge, may be more likely to experience participant attrition unless steps are taken to encourage sustained participation, such as continuity in follow-up outreach, and increases in sample sizes to account for losses.
Sample Size Planning for Stepped-Wedge Designs
Sample size estimation and power calculations for stepped-wedge designs require specification of the number of clusters, number of time steps, number of participants per cluster per step, the desired effect size, and the expected variability of responses at both the individual and cluster level. The variances of cluster-level responses are often expressed as a function of a “variance inflation factor” reflecting the impact of the intra-class correlation between individual responses within a given cluster. If we denote this quantity by ρ, and by N, the number of individuals in each cluster, the following expression gives the approximate cluster-level variance in the case in which responses do not vary with time:
The expression before the brackets represents the variability in the case of independent individual responses, and the bracketed quantity is the variance inflation factor. This makes it clear that the sample size required for stepped-wedge trials will increase with both the cluster size and the intraclass correlation between individual level responses. Sample size estimates typically assume constant treatment effects.
Conclusion
This paper presents a summary of key methodological and implementation elements of two quasi- experimental designs that have particular relevance for practice-based research. Additional studies that utilize these methods and offer variants that adapt to important local considerations can increase the evidence base for controlled studies conducted with in the complex environment of PBR.
Acknowledgments
The authors acknowledge the Agency for Health Care Research and Quality funding (R18 HS 017261, Harnessing Health Information Technology for Self-Management Support, and Medication Activation in a Medicaid Health Plan).
Notes
-
This article was externally peer reviewed.
-
Funding: This work was funded by AHRQ grant R18 HS 017261, Harnessing Health Information Technology for Self-Management Support and Medication Activation in a Medicaid Health Plan, McKesson Foundation.
-
Conflict of interest: none.
- Received for publication February 25, 2011.
- Revision received June 16, 2011.
- Accepted for publication June 30, 2011.