Has the FDA lost the plot on surrogate endpoints?

Surrogate endpoints are intended to be used to accelerate drug development, based on reasonable evidence that the endpoint is predictive of an ultimate effect. But recent research and evidence from the FDA show that the agency seems to be overly reliant upon the use of surrogate endpoints, with major implications for the types of medical products FDA is willing to approve.

In the first part of this series, AgencyIQ provided a primer on surrogate endpoints. Here’s a quick refresher:

These endpoints are needed (and offer real value) in many situations where a “true” endpoint may be unevaluable within the context of a clinical trial for a myriad of reasons, including that it will occur too far off in time, occurs too rarely, or presents an ethical dilemma. According to the FDA, a surrogate or intermediate clinical endpoint is “a marker – a laboratory measurement, radiographic image, physical sign or other measure that is thought to predict clinical benefit, but is not itself a measure of clinical benefit.” In effect, it acts as a stand-in for a clinical outcome that actually does demonstrate an improvement in how patients feel, survive or function. A common example of a “true” long-term indicator of clinical response in the oncology space is overall survival (OS); examples of associated surrogate endpoints include progression-free survival (PFS), relapse-free survival (RFS) and metastasis-free survival. [ To read part one of this series, click here.]
However, surrogate endpoints are varied, and there is a significant difference between “reasonably likely” and “validated” endpoints. Reasonably likely surrogate endpoints – which are permitted for use for accelerated approval by the FDA – are those which are “supported by strong mechanistic and/or epidemiologic rationale, but the amount of clinical data available is not sufficient to show that they are a validated surrogate endpoint.” Validated surrogate endpoints – which can be used for traditional approval by the FDA – are supported by both a clear mechanistic rationale and clinical data “providing strong evidence that an effect on the surrogate endpoint predicts a specific clinical benefit.” Leaping from “reasonably likely” to “validated” requires that data from multiple studies be pooled through meta-analyses and a significant amount of time; to show a consistent correlation between a surrogate endpoint and a “true” clinical endpoint, patients must actually reach that true endpoint. In the case of an endpoint like OS, this could take many years to achieve.
Over time, it has become clear that not all surrogate endpoints are accurately predictive or appropriate. The true value of these endpoints will depend on a variety of factors, including the initial predictive value of the endpoint itself, the specific disease state within which it is being used, and the drug in question. A surrogate endpoint may be highly predictive in some disease states but not others (due to the inherent nature of the disease), and for some drugs but not others (due to the drug’s mechanism of action). Additionally, the toxicity of the drug in question can impact the true long-term benefits of a drug, in some cases raising the risk level to the point where a drug is no longer beneficial and may even be detrimental in a certain group of patients.
Over the past few decades, surrogate endpoints have been used to support drug approvals with increasing frequency. Assessments of FDA drug approvals over the past two decades indicate that more than half of all approvals are now based on surrogate endpoints, a number that is much higher for oncology approvals. A review of approvals occurring between 1995-2017 identified a consistent trend in the use of surrogate endpoints. Whereas 48% of drug approvals occurring between 1995-1997 used surrogate endpoints, 60% of those approved between 2015-2017 used surrogate endpoints. A separate review of oncology drug approvals between 2006-2017 found that 71% were based on surrogate endpoints; a snapshot review of drug approvals in 2020 found that 94% of all oncology drug approvals that year utilized a surrogate endpoint.
Here, AgencyIQ analyzes the FDA’s actions related to surrogate endpoints, identifying what appears to be a shift away from its own definitions and original views on the topic.

A recent review of accelerated approvals in oncology suggests that the FDA has quietly shifted its own stance on unvalidated surrogate endpoints

A study published in JAMA on April 7, 2024 posed the question, “What is the clinical benefit of cancer drugs granted accelerated approval, and on what basis are they converted to regular approval?” The research was conducted through the Program on Regulation, Therapeutics and Law (PORTAL) at Brigham and Women’s Hospital and Harvard Medical School, with funding support from Arnold Ventures. The scope of the study included cancer drugs granted accelerated approval between January 2013 and July 2023, forming a dataset of 129 total cancer drug–indication pairs. The analysis separately considered indications with fewer than (N = 83) and more than (N = 46) five years of follow-up time since their approval. (Note: AgencyIQ also receives funding support from Arnold Ventures; this funding is used to provide public access to some of our analysis, but is structured to avoid influencing our analysis).
Of the full dataset of 129 drug-indication pairs, 48 received regular approval (37.2%), 56 of the pairs (43.4%) were considered “recently ongoing,” seven (5.4%) were ongoing, and 18 were withdrawn (13.9%). For those approvals with more than five years of follow-up time, the study found that only 63% of indications were converted to traditional approval. Of these, only 40% were based on the gold standard – OS. PFS was used to justify 44% of conversions; 10% of conversions to traditional approval were based on response rate (RR) plus duration of response (DOR). Concerningly, conversions based solely on RR/DOR represent an increasing trend, the researchers noted. “From 2013 to 2020, 0 of 28 conversions were based on response rate, whereas from 2021 to 2023, 7 of 19 conversions (37%) were based on response rate,” the paper states.
More than half (57%) of drugs with five years of follow-up after receiving accelerated approval had not shown clinical benefit, according to the slides presented by the study authors at the American Association for Cancer Research (AACR) Annual Meeting 2024. Of those drugs which were converted to traditional approval, nine did not improve OS or quality of life. Additionally, seven of these converted drugs improved OS without an improvement in quality of life; six improved quality of life but without an improvement in OS.
A randomly selected case study: Capmatinib. Granted accelerated approval in 2020 for patients with non-small cell lung cancer (NSCLC) and a specific mutation, capmatinib was initially approved on the basis of RR (a composite of complete and partial responses) and DOR. The RR in previously untreated and previously treated patients was 68% and 41%, respectively; the DOR was 12.6 and 9.7 months, respectively. Traditional approval was granted two years later – curiously, on the basis of these same exact endpoints. For previously treated patients, the RR remained the same, whereas the DOR was 4 months longer (16.6 months). For untreated patients, the RR and DOR remained almost identical (44% and 9.7 months). The total number of patients evaluated increased in these two years, but only from 97 to 160.
Importantly, the study that supported both of the approvals for capmatinib was an open-label, non-controlled study; due to the lack of a control group, any actual improvement in outcomes from baseline is unclear. Additionally, capmatinib is not without significant adverse effects: 67% of patients experienced grade 3-4 adverse events, 23% experienced at least one adverse event leading to a dose reduction, and 11% experienced an adverse event leading to drug discontinuation. Considering the small increase in total patients evaluated, the lack of change in RR, the nominal increase in DOR, the lack of a relevant comparator group that could confirm or deny the existence of a clinically relevant benefit, and the high rate of serious adverse events, one may reasonably wonder what led the FDA to convert this approval at this point in time.

The conclusions drawn from this paper have sparked criticism, but critics seem to miss some important counter-points related to surrogate endpoints

One prominent commentary published in BioCentury, authored by Washington Editor STEVE USDIN, accused the authors of conducting “intellectual malpractice” by completing what he referred to as a “drive-by analysis of accelerated approval.” At the crux of Usdin’s argument is the idea that the absence of evidence is not the same as evidence of absence – the absence of OS data is not the same as evidence that a drug does not improve OS. In support of this argument, he cites two examples of approvals that were converted from accelerated to traditional within about two years: Libtayo (cemiplimab) and Jemperli (dostarlimab). Notably, these approval timelines, as well as the evidence used to support their conversions, closely mirrored that of capmatinib – the evidence used to support traditional approval was strikingly similar to that used for the accelerated approval. Usdin argues that these drugs appeared to offer some continued benefit to some patients over the additional two years of data collection and thus are of value.
An important distinction missing from both the criticism and the JAMA article: “reasonably likely” versus “validated” surrogate endpoints. Usdin focuses a segment of his argument on the approval of drugs for the treatment of chronic myeloid leukemia (CML), a form of cancer for which treatment was revolutionized in 2001 with the approval of Gleevec (imatinib), a first-of-its kind tyrosine-kinase inhibitor that effectively halts the mutation that occurs in CML (the Philadelphia chromosome). Since then, multiple other drugs have been approved for this indication (e.g., nilotinib, dasatinib and ponatinib), many of which have yielded a higher efficacy rate than imatinib and/or the ability to address resistant forms of the disease. The JAMA article cites two of these drugs (bosutinib and asciminib) as examples of approval conversions which did not rely on OS or quality of life. As Usdin notes, these drugs were converted to traditional approval on the basis of “major molecular response and complete cytogenetic response.” Here he makes the valid point that these endpoints are clinically relevant in this patient population. However, both the BioCentury and JAMA articles fail to acknowledge an important regulatory precedent – these surrogate endpoints have been validated in their ability to predict OS in patients with CML and are considered appropriate for traditional approval by the FDA.
Another issue not discussed in either the JAMA article or the BioCentury commentary: The need to move the evidentiary goalpost as the outlook for a patient population changes over time. In his commentary, Usdin points to multiple myeloma (MM) as an example of a disease state that has benefited dramatically from many accelerated approvals, using this as an argument to continue to support accelerated approvals in this patient population. In fact, this example may be interpreted in another way, offering a case study of the need to raise the bar of evidentiary expectations in patients seeing a rapidly changing level of need and increased duration of survival. A deeper analysis of this situation is provided below.

Multiple myeloma and surrogate endpoints: A recent and evolving case study

Earlier this year, the FDA convened its Oncologic Drugs Advisory Committee (ODAC) to discuss a potential new surrogate endpoint for MM: minimal residual disease (MRD). MRD is defined by the FDA as the detection of malignancies at low levels by measuring cell characteristics such as genetic mutations, cell surface markers or specific DNA gene rearrangements. The use of MRD as a diagnostic, prognostic, predictive, efficacy-response or monitoring biomarker is highly dependent upon its specific context of use. While the FDA considers MRD to be a “general measure” of tumor burden, the agency’s opinion on MRD for regulatory use depends on the disease it is used to assess.
The ODAC discussion revolved around two meta-analyses conducted by two separate research groups, the University of Miami Sylvester Comprehensive Cancer Center and the International Myeloma Foundation’s i2TEAMM. The University of Miami meta-analysis ultimately included eight studies for a total of 4,907 newly diagnosed patients; the i2TEAMM meta-analysis ultimately included 20 studies for a total of 12,926 newly diagnosed or refractory patients.
Of note, the FDA clarified that these meta-analyses were intended to assess the strength of both individual- and trial-level associations. At the individual level, both analyses identified a strong association between MRD and PFS (a surrogate endpoint) and OS. However, the analyses found some association between MRD and PFS at the trial level, and no significant association between MRD and OS. Additionally, when the FDA conducted its own analysis of the data, it identified a weak association between MRD and PFS and no association between MRD and OS at the trial level.
Despite the lack of correlation between MRD and OS at a trial level, FDA’s advisory committee voted in favor of using MRD as a surrogate endpoint for MM. This seems particularly important considering the poor correlation between OS and other accepted surrogate endpoints in MM. A study published in the British Journal of Hematology in 2022 evaluated the correlation between time-to-event surrogate endpoints in MM clinical trials conducted between 2005 and 2019. The analysis found that only 43% of the variance in OS was due to changes in PFS, and that the overall correlation between PFS and OS was weak (0.65). In relapsed or refractory MM, this association grew slightly stronger, with 58% of the variance in OS due to changes in PFS, representing a medium correlation (0.76).
The MM setting also offers clear examples of drug regimens that have ultimately yielded an OS detriment due to drug-related toxicities. Two studies evaluating the addition of pembrolizumab to an existing regimen were terminated early by the FDA due to an increased risk of death. Although these studies did not show a PFS benefit at the time of study termination, they also did not show a PFS detriment. OS outcomes, on the other hand, were strongly weighted in favor of the placebo arm, with OS hazard ratios of 1.61 in one study and 2.06 in the other. A separate study, the BELLINI trial, demonstrated a PFS clearly in favor of the treatment arm (hazard ratio = 0.58). However, the median OS ultimately favored the placebo arm, with a hazard ratio of 1.19.
Considering the extended lifespan that many MM patients are experiencing as a result of newly available treatment options, there may not be a clear need for yet another surrogate endpoint. While the five-year relative survival rate for patients with MM was estimated to be 57% between 2012 and 2018, more recent research suggests that this number may be increasing rapidly. The PERSEUS trial, published in the New England Journal of Medicine in 2023, identified a 4-year survival rate of about 90%. And some estimates indicate that median patient survival may now be as high as 8-10 years. While this is an unquestionably positive development, it does introduce a new confounder. MRD as a surrogate endpoint would allow for shorter trials, which ultimately provide less safety data at the time of approval. This could result in exposing patients to drug-related toxicities that could turn out to either shorten survival time or introduce long-term sequelae.

The FDA may also be signaling a weaker stance on surrogate endpoints outside of oncology

Elevidys, which received a particularly contentious accelerated approval based on an unvalidated surrogate endpoint, is currently being reviewed for traditional approval. This gene therapy product (delandistrogene moxeparvovec; Sarepta) received accelerated approval in June 2023 for patients 4-5 years of age with Duchenne muscular dystrophy (DMD). According to the FDA, this approval was made solely on the basis of increased levels of a specific protein called microdystrophin. Although the gene therapy itself is intended to work by increasing levels of microdystrophin, it is not yet known whether this will actually confer clinical benefit. In its announcement of the accelerated approval, the FDA acknowledged the unvalidated nature of this surrogate endpoint, the lack of clinical benefit demonstrated to date, and the expectation that the company would “complete a clinical study to confirm the drug’s clinical benefit.”
In October 2023, Sarepta announced that the confirmatory clinical study yielded no improvement in the primary efficacy outcome, and yet it planned to seek traditional approval. The study showed that patients that received Elevidys improved 2.6 points on the North Star Ambulatory Assessment over a 52-week period as compared to 1.9 points of improvements for patients given a placebo. However, Sarepta said the study showed statistically significant improvements were seen for two pre-specified secondary endpoints, Time-to-Rise (TTR) and the 10-meter walk test, suggesting at least some indication of benefit.
All signs currently suggest that Elevidys may indeed receive its full approval, regardless of these disappointing findings. During earnings calls, Sarepta has signaled strong confidence that it will receive traditional approval and an expanded indication. And while that may seem like corporate optimism, the news publication STAT has reported that PETER MARKS, the director of the FDA’s Center for Biologics Evaluation and Research (CBER), has voiced an opinion that these disappointing findings may not – or should not – stand in the way of full approval. Although many scientists and clinicians are involved in the review process, Marks’ opinion holds extra weight. In fact, he previously overruled CBER review staff and issued a “decisional memo” granting accelerated approval to this very product despite the review team’s initial plan to reject the application.
Shifting to a very different population: This year brought the first accelerated approval of a drug for the treatment of nonalcoholic steatohepatitis (NASH). That drug – Rezdiffra (resmetirom; Madrigal Pharmaceuticals) – received accelerated approval on the basis of improvement in liver inflammation and scarring after 12 months. As part of that approval, the FDA is requiring the sponsor to complete a 54-month confirmatory study to verify clinical benefit. According to the approval letter, that clinical trial will be expected “to demonstrate clinical benefit on the composite endpoint of progression to cirrhosis, hepatic decompensation events, liver transplant, and mortality.”
With a large and growing pipeline of drugs being developed for the treatment of NASH, multiple other drugs may receive accelerated approval before the final results for Rezdiffra read out. These drugs are anticipated to be used in a large portion of the U.S. population (currently, up to 6.5% of all adults are thought to have NASH, and that number is expected to increase) and, presently, are anticipated to be used long-term. Their costs, benefits, and risks stand to have a major impact on the healthcare system. But it remains to be seen whether the FDA will hold these drugs to its highest standard for traditional approval or allow them to fall short in the end.

A few final thoughts:

The FDA’s house-of-cards problem: Some unvalidated surrogate endpoints – and in particular those that are newly used for a specific (or group) of rare diseases – are commonly used across different companies’ development programs. While this makes sense, as FDA’s interpretation of a surrogate endpoint as contextually useful shouldn’t necessarily be limited to the first product under review, it does raise significant risks for regulators, sponsors and patients. Consider three theoretical drugs: Drug A, Drug B, and Drug C. All three make use of the same unvalidated surrogate endpoint. Drugs A and B are approved first, and begin a confirmatory study; Drug C is approved last but manages to complete its confirmatory study first. If Drug C’s confirmatory data indicate that the drug did not demonstrate safety or efficacy as compared to the comparator product, what does that mean for Drugs A and B? The problem, then, is that the failure of one product making use of a surrogate endpoint could cast doubt on multiple products using that same endpoint, leaving regulators, sponsors and patients unsure as to how best to proceed. To pose but one potential question: Is the failure of one drug using a surrogate a failure of the drug, or the surrogate?
The withdrawal problem: These questions become more problematic once the FDA does decide to withdraw a drug product. As AgencyIQ has recently explained, the FDA finds it difficult in the best of circumstances to withdraw a drug. But in cases where the evidence for approval was potentially circumstantial and based on an unvalidated surrogate endpoint, the rationale for withdrawing approval for the product may be especially difficult in situations when sponsors can just as easily point to other surrogate endpoints (or subgroups) as a rationale for keeping that same product on the market. In other words: Without strong belief in an endpoint, that belief can just as easily be shifted to another endpoint.
If the FDA is going to increasingly rely on surrogate endpoints, it needs to place equal emphasis on ensuring patients and sponsors have rapid access to confirmatory data — both about the drug and its endpoint. Given the high price many of these drug products command, it seems irresponsible to both patients and taxpayers not to have rapid evidence that these products and the endpoints on which they rely are indeed effective. Such actions would also benefit drug development, allowing others sponsors in a space to have more confidence that the selected endpoints are ultimately useful and may be relied upon.

Featuring previous research by Amanda Conti.
To contact the author of this item, please email Chelsey McIntyre ( cmcintyre@agencyiq.com).
To contact the editors of this item, please email Kari Oakes ( koakes@agencyiq.com) and Alexander Gaffney ( agaffney@agencyiq.com).