TO THE EDITOR:
The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the COVID-19 pandemic in 2020 created unprecedented global challenges to existing models of evidence evaluation. COVID-19 was recognized as being associated with a high incidence of thrombotic complications. In the absence of identified therapies, an imperative emerged to rapidly initiate therapeutic clinical trials in the domains of hematology and thrombosis. COVID-19 presented unique challenges, including a rapidly evolving virus, heterogeneous host response, limited mechanistic insight, disproportionate impact upon vulnerable and underrepresented patient populations, and an urgent need to identify widely available therapies to address the global pandemic. Designing randomized clinical trials (RCTs) that incorporated these and other complexities was an evolving concept when the COVID-19 pandemic began, at which time the imperative to conduct efficient, nimble, reproducible, pragmatic, and impactful clinical trial designs became immediate.
The first recognized double-blind RCT, conducted in 1943, was similarly designed to address patient and population needs in a viral illness—the common cold.1 The fundamental importance of randomization in reducing systematic bias in interventional trials is deeply rooted and widely accepted in medicine.2 Thus, the RCT has taken its place as the gold standard for establishing causation in clinical research. Although our understanding of diseases and the practice of medicine have substantively evolved over time, the design and conduct of RCTs, despite their obvious limitations, have continued with little adaptation.
Conventional, parallel group RCTs share a number of limitations that may hamper the generation of evidence. Operationally, they are cost-inefficient in that an immense amount of expertise and infrastructure is required to support a single clinical trial. Clinical trials typically run in parallel to clinical care rather than being embedded into care as part of a learning health system, which means we are often forced to decide whether we want to “learn” or “do.”3 That is, the very structure of research environments and classical RCTs creates an unfortunate dichotomy between the need to provide clinical care while also learning what optimal clinical care entails. Furthermore, the majority of RCTs compare a small number (usually 2) of discreet and carefully defined interventions, The target trial sample size is engineered to estimate average treatment effects by group but is unable to effectively investigate heterogeneous treatment effects within clinically relevant subpopulations. Post hoc subgroup comparisons, even if specified a priori, are generally regarded as exploratory; findings require confirmation with future RCTs. Thus, the conventional RCT is limited in its ability to rapidly identify subgroups of patients who may benefit within the trial population.4 When trial enrollment occurs in the setting of a rapidly evolving disease process, conventionally designed trials can also be unresponsive to important changes in event rates for controls over time, possibly reflecting changes to the at-risk population or the implementation of other concurrent treatments.5 Limited interim analyses combined with overly conservative stopping rules that are triggered only after substantial enrollment notably attenuate the ability of a conventional RCT to rapidly reach trial conclusions, and they profoundly mitigate the ability of the trial to adapt to changes in disease outcomes or pathophysiology.6
Early observations in patients with COVID-19 suggested a high incidence of micro- and macrovascular thrombotic complications7 and disproportionately poor outcomes in minorities as well as those with specific medical comorbidities.8,9 With no specific therapies for SARS-CoV-2 available, the need to design and launch efficient and pragmatic clinical trials is paramount. The absence of baseline outcome event rates, expected effect sizes, or insights into the variation of these estimates highlighted the inherent limitation of parallel-group RCTs that rely on assumed, fixed sample sizes. Under this framework, 95% of trials launched during the peak of the COVID-19 pandemic were underpowered to evaluate meaningful clinical differences.10 Changes in disease outcomes potentially related to vaccination, evolution of variants, changing patient demographics, and the implementation of effective therapies further attenuated the effectiveness of conventionally designed trials. Moreover, a notable spectrum in the severity of clinical disease foreshadowed the possibility for heterogeneous treatment effects based on the severity of the illness. Unlike standard RCTs, which build and scale infrastructure that is used to complete a single trial, a pandemic-level response necessary for developing reusable and enduring infrastructure capable of testing multiple therapies in different patient cohorts emerged. The ideal clinical trial design during the pandemic required flexibility in such parameters as sample size, target population, and interventions being studied, as well as the ability to rapidly generate results, disseminate knowledge, and recycle existing trial infrastructure.
Numerous investigators recognized the limitations of traditional RCTs and responded to an urgent global need in the face of sizable unknowns. They broke from decades of convention and turned to adaptive platform trials to study potential therapies.12,13 One example of such a platform is the Randomized Embedded Multifactorial Adaptive Platform for Community-acquired Pneumonia (REMAP-CAP) trial. REMAP-CAP began in 2016 as an intensive care unit (ICU)–based trial to investigate therapies for CAP.14 Prescient in design, the REMAP-CAP master protocol prespecified a pandemic appendix that was designed in response to challenges encountered during the 2009 H1N1 influenza pandemic.15 The adaptive design and function of REMAP-CAP overcame many of the limitations of conventional RCTs, and in a short period of time, it generated practice-changing evidence related to the use of corticosteroids,16 interleukin-6 receptor antagonists,17 antivirals,18 convalescent plasma,19 and therapeutic anticoagulation.20,21
The use of adaptations in clinical trials is one mechanism to address some of the limitations inherent in conventional trial designs. Examples of potential adaptations are sample size reassessment, enrichment strategies (adaptive inclusion criteria to target at-risk patient populations), and response-adaptive randomization (adaptive allocation assignments that weight randomization in favor of better-performing interventions).4,5,22 Adaptive platform trials incorporate adaptive techniques along with a master protocol23 and enduring trial infrastructure so they can evaluate multiple interventions in a perpetual manner.4,10,11,14 The resulting trial design offers the ability to add or halt randomization to specific interventions on the basis of trial conclusions (eg, superiority or futility), with the goal of maximizing efficiency and speeding up the generation of knowledge.
Adaptive platform trials are greatly facilitated by the implementation of Bayesian statistics. The adaptive approaches rely on extensive pretrial simulations that consider the interactions between multiple study arms, patient stratification variables, a priori subgroups of interest, plans for adaptation, and estimates of efficacy. Simulations are repeated thousands of times to provide robust estimates of plausible trial outcomes based on the variables and scenarios being considered. It is important to distinguish the concept of a priori subgroups in Bayesian adaptive trials from a conventional understanding of subgroups studied in conventional, 2-arm frequentist trials. Incorporating subgroups in an adaptive manner essentially includes these group as part of the primary outcome, allowing for adaptive triggers and including statistical stopping criteria within these prespecified groups. This differs from post hoc secondary analyses of subgroups in which, in both Bayesian (when not incorporated into adaptive rules) and frequentist methods, subgroup analysis is frequently underpowered to make conclusions regarding clinically relevant treatment effects.
By using master protocols, adaptive platform trials can evaluate multiple treatments with common end points in the context of shared trial infrastructure.24,25 Interventions in a platform trial are often tested domains, each of which is guided by a domain-specific appendix, which is also part of a master protocol. Thus, as knowledge of and treatments for a specific disease evolve, domains and interventions within a domain can be added or withdrawn under the auspices of a master trial protocol without altering core trial infrastructure. The focus of the adaptive platform becomes a disease rather than a specific intervention.4 This has previously been characterized by the analogy of building a stadium and tearing it down after one game (conventional RCT) vs playing multiple games over several seasons in one stadium (adaptive platform design) (Scott Berry PhD, Berry Consultants, oral communication, 22 February 2022). This creates a trial ecosystem in which numerous interventions can be studied concomitantly and sequentially over time. The concept of adaptive platforms has been extensively reviewed in detail elsewhere.5,26
Although a full comparison of Bayesian to frequentist statistics is outside of the scope of this commentary and has been reviewed elsewhere,27 a few important differences are notable to highlight the rationale for choosing a Bayesian framework to guide the conduct and analysis of novel trials during the pandemic. Frequentist statistics rely on P values, which reflect a long-run probability of effect, assuming that the null hypothesis is correct. In essence, the best outcome in a frequentist trial is to disprove the null hypothesis, when the actual intention is often to understand how likely is it that an experimental therapy is effective. Frequentist methods use data from only the present experiment at a fixed time (onset of the design) and return a binary interpretation of statistical significance. Sample size estimates are typically based on imperfect assumptions, and in the process of guessing the correct sample size, completed trials are typically overpowered (ie, a trial conclusion could have been declared with fewer patients randomly assigned; patients may have been unnecessarily randomly assigned to an inferior treatment in a superiority trial), or underpowered (ie, the treatment effect was declared nonsignificant when continued randomization would have demonstrated a clinical and significant difference). Either scenario wastes clinical trial investments and slows the dissemination of knowledge.
Bayesian design, however, predicts the probability of an event actually happening and tests null and alternative hypotheses to provide a measure of certainty for each. Importantly, Bayesian frameworks use both previous knowledge and data generated from the trial to predict outcomes. This allows for a strategy that, in some ways, adapts to changes during the trial by incorporating current data into statistical models. It also allows for the ability to launch a trial without knowledge or informed expectations of baseline event rates or treatment efficacy, which are required in a frequentist design and can result in under- or overpowering. As opposed to providing a dichotomously interpreted P value, Bayesian outputs provide posterior probabilities and credible intervals (ie, the range of possible treatment effects) to quantify the certainty of a hypothesis.2,6,27,28
In February 2020, REMAP-CAP implemented a pandemic appendix to the core trial protocol. REMAP-CAP (called REMAP-COVID15 in the United States) initiated multiple domains such as comparative evaluations of immunomodulators, antivirals, corticosteroids, therapeutic anticoagulation, and anti-platelet treatments. Building upon early observations of both large-vessel and microvascular thrombosis in the context of increased inflammation, REMAP-CAP sought to test the hypothesis that anticoagulation with therapeutic-dose heparins would reduce the need for organ support in patients hospitalized for COVID-19. At the time, REMAP-CAP was largely an ICU-based trial that enrolled critically ill patients into a Bayesian randomized, adaptive platform, open-label trial.
At or around the same time, several other adaptive platform trials were developed, including the evaluation of empiric therapeutic-dose heparin in patients hospitalized for COVID-19: Antithrombotic Therapy to Ameliorate Complications of COVID-19 (ATTACC) (NCT04372589)29 and the National Institutes of Health–sponsored Accelerating COVID-19 Therapeutic Interventions and Vaccines 4 ACUTE (ACTIV-4a) (NCT04505774) trial. The ACTIV-4a adaptive platform trial studied the effects of vascular protection strategies, including antithrombotics, in patients (critically ill and noncritically ill) who were hospitalized for COVID-19. Collectively, these 3 platforms included an evaluation of therapeutic-dose anticoagulation with heparin vs usual care thromboprophylaxis.
Early in the design and implementation phases of all 3 trials, the investigators recognized an opportunity for synergy and collaboration. Seeking to maximize speed, minimize competition, and overcome the barriers of inadequately powered studies, as well as to enhance external generalizability,11 the REMAP-CAP, ATTACC, and ACTIV-4a investigators prospectively harmonized these 3 trials into 1 multiplatform RCT (mpRCT) (Figure 1). The 3 platforms harmonized eligibility, intervention characteristics, data collection, and outcome measures such that a single harmonized trial protocol was launched. Harmonized electronic case report forms in 2 of the platforms further permitted seamless integration of data. The participating platforms federated data prospectively by using agreed-upon stopping triggers based on results in the combined enrolled population. Central statistical support with expertise in Bayesian statistics was also shared among all 3 platforms. The investigators and sponsors coordinated communication across Data and Safety Monitoring Boards (DSMBs) through a prespecified DSMB interaction plan to allow for review of data from each platform. Although there were differences among the protocols of the 3 platforms, these differences were minor and were overshadowed by uniformity in the inclusion criteria, intervention protocols, and trial outcomes. Cross-platform differences included the use of response-adaptive randomization in REMAP-CAP and ATTACC but not in ACTIV-4a and the specific working of exclusion criteria and the breadth of secondary outcomes and adverse events. The supplemental appendixes of the mpRCT publications20,21 highlight the high degree of overlap in the cross-platform comparison tables. Importantly, the mpRCT provided a novel method for global collaboration in which independent trial networks worked together to complete a single randomization trial while retaining their operational independence and sponsorship of their respective platforms. By working together, the platforms of the mpRCT achieved what none of them could have achieved on their own in the time frame available for informing clinical decision-making during the pandemic.
The mpRCT design substantially reduced the time required to reach trial conclusions. ACTIV-4a launched on August 10, 2020, and combined its efforts with the already enrolling ATTACC and REMAP-CAP platforms. As a consequence of unprecedented collaboration, frequent interim analyses, and a priori defined adaptive group sequential stopping criteria, the mpRCT reached its first trial conclusion (futility) on December 19, 2020. In critically ill patients with COVID-19, the probability that therapeutic-dose heparin improved organ support-free days was <5% (posterior probability of futility was 99.9%).21 Results from this trial population were immediately and publicly released. In rather short order, empiric therapeutic-dose heparin was withdrawn as a treatment option in critically ill patients with COVID-19. One month later, on 22 January 2021, enrollment in the noncritically ill (non-ICU level of care) was discontinued for a finding of superiority. Compared with usual care, the probability was 99% that therapeutic-dose anticoagulation with heparin (regardless of baseline D-dimer) increased the adjusted odds of survival to discharge with reduced need for organ support.20
The structure of the mpRCT, including engagement of 393 sites in 10 countries, led to rapid and generalizable trial conclusions. Although the size and scope of the platforms helped facilitate enrollment, the use of a Bayesian framework with frequent interim analyses combined with a priori sequential stopping groups resulted in our ability to rapidly and simultaneously evaluate and report the results of several hypotheses and thus assess the potential for heterogeneous treatment effects within a single trial. The adaptive design not only allowed for testing of subtypes based on disease severity but also provided the framework for subsequent investigations within each platform. ACTIV-4a has continued in partnership with ATTACC sites to test the effect of platelet inhibition using P2Y12 inhibitors; a lack of benefit of P2Y12 inhibition combined with therapeutic anticoagulation in moderately ill patients with COVID-19 was recently reported.30 In addition, ACTIV-4a has recently launched simultaneous arms testing crizanlizumab and SGLT-2 inhibitors in both moderate and severe disease states while continuing to enroll patients in the severe disease state in the P2Y12 inhibitor arm. REMAP-CAP also continues to test antiplatelet strategies alongside multiple other domains as the pandemic continues, and ATTACC in collaboration with ACTIV-4a and REMAP-CAP sites are pivoting trial resources to study therapeutic-dose heparin in non–COVID-19 pneumonia. The results of the mpRCTs have been incorporated into numerous guidelines, including those of the American Society of Hematology,31 American College of Chest Physicians,32 National Institutes of Health,33 and the National Institute for Health and Care Excellence,34 among others.
The complexity of mpRCTs presents challenges and potential limitations. Bayesian statistical analyses and the use of response-adaptive randomization (RAR) are relatively new concepts to many readers of the medical literature, and the strategies may have an impact on the interpretation of trial results. In a 2-arm trial, RAR may reduce statistical power; however, the use of RAR may be offset by the increased acceptability of randomization. The reporting of posterior probabilities, as opposed to P values, is an important concept to understand when interpreting the results. Predefined thresholds for stopping the trial are determined by the investigators, which can pose a limitation in scenarios in which clinical response is uncertain. Although the use of posterior probabilities is inherently different from the use of P values in frequentist statistics, we believe that posterior probabilities better reflect how clinicians interpret the results of frequentist hypothesis testing and how they communicate trial findings to patients. Statistical framework aside, there were extensive logistical and implementation challenges for harmonizing 3 platforms into an mpRCT. The strength of the resultant trial design, however, was that all 3 platforms actioned a unified protocol while maintaining trial network autonomy.
In hematology, Bayesian adaptive platform trials hold exciting promise for generating evidence and disseminating knowledge. In a common and continually evolving syndrome such as venous thrombosis, the development of an enduring adaptive platform trial with multiple specific domains has the potential to create a cost-efficient clinical trial ecosystem in which multiple hypotheses can be evaluated simultaneously and sequentially over time in different at-risk groups. In the field of malignant hematology, the creation of adaptive platform infrastructure to study a disease such as acute myeloid leukemia would permit investigators to concomitantly evaluate multiple interventions such as antineoplastic and supportive therapies in patients stratified by genetics or other relevant risk characteristics. Adoption of Bayesian frameworks would forestall the need to predict required samples size and, in the presence of heterogenous treatment effects, allow trials to reach trial conclusions sequentially in the clinically relevant subgroup. By inching closer to a learning health system, implementation of response-adaptive randomization may render randomization more comfortable to clinicians and families by increasing the proportion of patients randomly allocated to therapies that perform well. Adoption of multiplatform methods in which multiple independent trial platforms synergize to complete a distributed, yet single, randomized trial will further encourage global collaboration and enhance the pace of knowledge generation in high-priority hematologic diseases.
Acknowledgments: The authors acknowledge the contributions of all investigators in the mpRCT platforms as well as those of the study participants.
Contribution: M.D.N., P.R.L., and R.Z. conceived the commentary, and wrote and edited the manuscript.
Conflict-of-interest disclosure: M.D.N. served on the scientific advisory board of Haima Therapeutics; received research funding from the National Institutes of Health, Department of Defense, Haemonetics, Instrumentation Laboratory, and Janssen Pharmaceuticals; received honoraria from Haemonetics, Meredian HealthComms, Janssen, and CSL Behring; and is the co-chair of the ACTIV-4a trial supported by the National Heart, Lung, and Blood Institute. P.R.L. and R.Z. are lead investigators of the ATTACC platform and the REMAP-CAP platform supported by the Canadian Institute of Health Research.
Correspondence: Matthew D. Neal, University of Pittsburgh Medical Center, F1271.2 PUH 200 Lothrop St, Pittsburgh, PA, 15213; email: nealm2@upmc.edu.