Surrogate endpoints: Why they sometimes succeed and sometimes fail, and what that means for patients

Surrogate endpoints are intended to be used to accelerate drug development, based on reasonable evidence that the endpoint is predictive of an ultimate effect. In many clinical scenarios, the use of a surrogate endpoint is warranted and ethically necessary. However, surrogate endpoints do not always provide the desired level of predictive accuracy, for a variety of reasons.

Regulatory background: Surrogate endpoints

In an ideal world, clinical development studies would evaluate a drug’s effect on a “true” indicator of clinical response. However, that “true” endpoint may be unevaluable within the context of a clinical trial for a myriad of reasons, including that it will occur too far off in time, occurs too rarely, or presents an ethical dilemma. Although there have been examples of each of these situations over the years, perhaps the most common and widely recognized case exists in oncology, where an improvement in overall survival (OS) – a long-term outcome – is often considered the “true” indicator of clinical response. However, for many forms of cancer, evaluation of OS would require significantly extended clinical trial durations that ultimately delay the approval of the drug and availability to patients in need.
To address this barrier, surrogate endpoints have been increasingly adopted across various therapeutic areas. According to the FDA, a surrogate or intermediate clinical endpoint is “a marker – a laboratory measurement, radiographic image, physical sign or other measure that is thought to predict clinical benefit, but is not itself a measure of clinical benefit.” In effect, it acts as a stand-in for a clinical outcome that actually does demonstrate an improvement in how patients feel, survive or function. These endpoints are selected because they can be measured earlier, occur more frequently, and/or are ethically appropriate. In the realm of oncology, commonly utilized surrogate endpoints intended to stand in for OS include progression-free survival (PFS), relapse-free survival (RFS) and metastasis-free survival. Other metrics have also been developed as surrogate endpoints. These include objective response rate (ORR), which counts the number of people who experience either a partial or complete response within a certain period of time.
FDA has authority to grant accelerated approval on the basis of surrogate or intermediate endpoints, for “drugs for serious conditions that filled an unmet medical need.” To receive accelerated approval, the surrogate or intermediate endpoint must be “considered reasonably likely to predict the clinical benefit of that drug.” Products granted accelerated approval by the FDA are – at least in theory – required to conduct confirmatory studies to verify a true clinical benefit and convert the accelerated approval to a traditional approval.

How are surrogate endpoints identified and evaluated?

The FDA divides surrogate endpoints into three categories on the basis of the supportive evidence: candidate, reasonably likely, and validated. Candidate surrogate endpoints are those which are still under evaluation. Reasonably likely surrogate endpoints, on the other hand, are those which are “supported by strong mechanistic and/or epidemiologic rationale, but the amount of clinical data available is not sufficient to show that they are a validated surrogate endpoint.” These endpoints are permitted for use for accelerated approval; however, their usefulness continues to be evaluated in the post-market setting, where additional data is collected with the goal of ultimately determining whether that endpoint is predictive of true clinical benefit.
Validated surrogate endpoints are supported by both a clear mechanistic rationale and clinical data “providing strong evidence that an effect on the surrogate endpoint predicts a specific clinical benefit.” These endpoints can be used for traditional approval, indicating the FDA’s stance that a validated surrogate endpoint acts as a true stand-in for the preferred measure of clinical benefit. The agency maintains a list of surrogate endpoints and whether they may be used for either traditional or accelerated approval.
Unsurprisingly, the leap from “reasonably likely” to “validated” is no small feat. Obtaining supportive clinical data requires that data from multiple studies be pooled through meta-analyses, often requiring a significant amount of time; to show a consistent correlation between a surrogate endpoint and a “true” clinical endpoint, patients must actually reach that true endpoint. In the case of an endpoint like OS, this could take many years to achieve.
Many “reasonably likely” surrogate endpoints are selected due to their prognostic value, or their established ability to predict the outcome of an individual patient. The National Cancer Institute defines a prognostic factor as “a situation or condition, or characteristic of a patient, that can be used to estimate the chance of recovery from a disease or the chance of the disease recurring (coming back).” For example, the expression of certain genes or proteins by tumors can factor into the overall prognosis for that patient. In people who have already undergone treatment, biomarkers that assess the level of treatment response may also act as prognostic factors. For example, a complete or partial response to treatment would be considered a prognostic factor for an individual patient.
To convert to a validated endpoint, that prediction must also hold at the trial level. The FDA defines a trial-level association as “the strength of the association between the effects of treatment on the surrogate and the true endpoint.” Put another way, this association assesses whether the treatment’s effect on the surrogate endpoint predicts a similar effect on the “true” endpoint. This represents an important distinction from individual-level correlation, because it determines whether a drug that is likely to positively impact the surrogate endpoint is actually likely to also positively impact the “true” endpoint. Although it may seem intuitive that both correlations would typically exist together, this is not always the case (more on this below).

So why do so many drugs that meet surrogate endpoint targets fail to yield true clinical benefit?

First of all, prognostic factors are rarely 100% predictive. Rather, these factors are typically taken into consideration alongside a variety of other prognostic factors for each individual patient. For example, a patient with newly diagnosed breast cancer will receive a prognosis that is based on lymph node status, tumor size, tumor grade, hormone receptor status, and a variety of other factors. And positive treatment results do not always correlate to positive outcomes in the long-term. Approximately 25-30% of breast cancer patients who initially experience a complete response to treatment will ultimately develop recurrence and disease-related death. So although an initial complete response is a positive outcome that is correlated with positive long-term outcomes in most of these patients, a significant portion of patients will not experience this positive long-term outcome.
Achieving a surrogate outcome may not have the same meaning for each drug. Much of this variation depends on the drug’s mechanism of action, the likelihood for resistance, and the duration of a drug’s effect. For example, a drug that causes a partial or complete remission in a certain type of cancer may not, in fact, eliminate all tumor cells, predisposing patients to an increased risk of relapse. Some tumors have also proven to be particularly adaptive, capable of developing resistance against certain mechanisms of action and overcoming the effects of the drug with time. These are some of the reasons why identifying a correlation on the trial level is a crucial element of transitioning to a validated surrogate: Although a response to treatment tends to be predictive of a better outcome for an individual patient, when viewed from the perspective of all patients who received a drug with a specific mechanism of action, this correlation may no longer hold true.
That same surrogate endpoint may also not mean the same thing for different disease states. For example, in the realm of oncology, PFS has proven to be a relatively poor predictor of OS in patients with solid tumors. A meta-analysis published in the European Journal of Cancer in 2020 found that there was only a 38% conversion rate between PFS to OS across the spectrum of solid tumors. In patients with certain forms of leukemia, on the other hand, PFS is highly correlated to OS. These represent two very different forms of cancer, affecting vastly different cell types and causing vastly different presentations. Depending on the nature and etiology of a given disease, a promising outcome from another disease may simply not carry the same weight.
Then there is the classic consideration of treatment toxicity, which informs the other half of a benefit-risk analysis. If a treatment seriously harms (or even kills) a significant portion of patients, its benefits come at too high of a cost. In many cases, however, this may not become fully apparent until OS data is available. (Note: OS, while traditionally thought of as an efficacy endpoint, is also a fundamental long-term safety endpoint for this very reason). Phosphatidylinositol 3-kinase (PI3K) inhibitors are a notable example of a drug class that was ultimately determined to cause a concerningly high rate of serious toxicities and death despite early evidence of benefit via surrogate endpoints. The BELLINI trial, which evaluated the use of venetoclax in patients with multiple myeloma, offers another commonly referenced example of an unanticipated increased risk of death in the setting of improved PFS. This example also harks back to the differing value of surrogate endpoints by disease state: venetoclax is approved for use in chronic lymphocytic leukemia, and although it causes a high rate of adverse effects, its benefit-risk ratio remains intact.
Treatment toxicity can also introduce another cause of treatment failure – non-adherence. Toxicities that patients may consider intolerable span a wide gamut from mild to severe and acute to chronic. Some wide-ranging examples include persistent nerve pain, excessive fatigue, severe diarrhea, and persistent nausea and vomiting. While many of these adverse effects may be considered bearable in the short-term, when they persist in the long-term, patients may ultimately make the decision that the treatment is no longer worthwhile. Trials of shorter duration (which are typically used to evaluate surrogate endpoints) may not see a high adverse effect-related dropout rate, whereas confirmatory long-term trials of that same drug may. Additionally, non-adherence contributes to a well-recognized phenomenon in which real-world outcomes can fall short of clinical study outcomes.

Widening the lens beyond oncology, the questions associated with surrogate endpoints continue to apply

Even in the world of chronic conditions, surrogate endpoints serve a necessary purpose, while posing many of the same questions. For example, in nonalcoholic steatohepatitis (NASH) – an increasingly common condition characterized by fibrosis and inflammation in the liver – an improvement in fibrosis is considered to be a reasonably likely surrogate endpoint. Considering the association between fibrosis and serious long-term complications, this is logical. Unfortunately, only time (and the collection of extensive clinical data) will tell whether the reversal of fibrosis actually returns a patient to their prior state of liver health and reduces long-term complications, or if the damage has already been done. Considering that it could take a decade or more to answer this question, however, it is unreasonable to require patients who already have NASH and are at risk for these complications to wait for that final answer.

The field of neurology is also rife with surrogate endpoints, often for difficult-to-treat and debilitating diseases. One recent example: Neurofilament (NfL) is now accepted by the FDA as a reasonably likely surrogate endpoint for amyotrophic lateral sclerosis (ALS), a deadly disease with a significant unmet need. This biomarker correlates to disease severity, progression rate, and survival in patients with ALS, and is markedly elevated in people with this condition as compared with other neurologic disorders. However, further research and data are needed to determine whether reduced levels of this biomarker actually correlate to a clinical benefit, or whether its presence is simply an incidental finding related to the disease process.

As the use of surrogate endpoints has grown exponentially in recent years, questions have arisen

Over the past few decades, surrogate endpoints have been used to support drug approvals with increasing frequency. Assessments of FDA drug approvals over the past two decades indicate that more than half of all approvals are now based on surrogate endpoints, a number that is much higher for oncology approvals. A review of novel drug approvals occurring between 1995-2017 identified a consistent trend in the use of surrogate endpoints. Whereas 48% of novel drug approvals occurring between 1995-1997 used surrogate endpoints, 60% of those approved between 2015-2017 did. A separate review of oncology drug approvals between 2006-2017 found that 71% were based on surrogate endpoints; a snapshot review of drug approvals in 2020 found that 94% of all oncology drug approvals that year utilized a surrogate endpoint.
While it is likely that many of these surrogate endpoints were appropriate within their context of use, the increasing prevalence of these endpoints raises a number of questions. First and foremost, it is unclear whether the FDA’s decisions surrounding surrogate endpoints have stayed true to its own definitions and stated stance on these endpoints. For example, are drugs approved via surrogate endpoint granted the appropriate type of approval? And for those that are granted accelerated approval, what level of scrutiny and evidence is actually being applied for transition to traditional approval? Have the FDA’s actions reflected an understanding for the poor correlation that has resulted between some surrogate and “true” endpoints?
Each time a surrogate endpoint is used, an appropriate balance must be struck, weighing the unmet need and outlook for a specific patient population against the value of using a surrogate endpoint. As an example: If a surrogate endpoint is not adequately validated and yields answers only one to two years before the “true” endpoint, is this actually beneficial in a patient population with other available treatment options or for which disease progression would be relatively minimal over that time period?
The FDA’s decisions in this space can have many downstream implications for the treatment landscape and patient care. If an appropriate balance is not struck, drugs that will ultimately yield an inappropriate risk-benefit profile may enter the market, complicating patient care and having the potential to worsen patient outcomes. Accelerated approval of a drug with unclear long-term benefits may also be difficult to evaluate in the real-world setting, where many patients would prefer to opt for the promising, approved drug instead of participating in further research. Separately, the widespread use of a surrogate endpoint that does not correlate well to its “true” endpoint may encourage sponsors to develop treatments which do not ultimately improve patient outcomes. And this development may be conducted in lieu of developing other novel treatments which could have the potential to truly revolutionize the outcomes of this same group of patients.