Abstract
Progress in the management of follicular lymphoma (FL) has translated to improved outcomes, with most patients surviving a decade or more from the time of diagnosis. However, the disease remains quite heterogeneous and a substantial number of patients have more aggressive disease with short responses to therapy and/or transformation to higher-grade lymphomas. Given the lack of a single standard approach, it is important to understand sources of heterogeneity among patients that influence initial management, surveillance strategies, and overall prognosis. Most of the validated tools, such as the Follicular Lymphoma International Prognostic Index (FLIPI) and FLIPI-2, apply to the frontline setting, and there is an unmet need for prognostic tools in relapsed and refractory disease states. In particular, the number of prior treatment regimens may be less important than the duration of response to the most recent regimen and the type of prior therapy received. Furthermore, despite awareness of progressive genetic and epigenetic derangements and a growing appreciation of the microenvironment's role in FL outcomes, there is no validated means of incorporating biologic data into clinical prognostic indices. This review highlights the current state of knowledge regarding risk stratification in FL.
Introduction
As the second most common histologic subtype of non-Hodgkin lymphoma in the Western hemisphere, follicular lymphoma (FL) is the prototype of indolent lymphomas. Approximately 30 000 patients were newly diagnosed with FL in 2010, with equal distribution between men and women (www.seer.cancer.gov/seertools). The median age at diagnosis is well into the sixth decade of life, but up to 25% of patients are younger (40 years or less).1 Approximately 70% of patients have BM involvement, reflecting stage IV disease at presentation, but there is significant variability in “tumor burden.” Major strides in the last decade have improved the general prognosis, with overall survival (OS) now approaching 12 to 15 years for many patients.2 However, improved survival also translates into an increasing number of patients living with this disease, so many more patients are exposed to repeated courses of cytotoxic and potentially immunosuppressive interventions. Despite the major improvements in OS, FL remains an incurable disease with traditional approaches and it is important to recognize sources of heterogeneity between patients. Approximately 10% to 15% of patients have aggressive disease and short survival, whereas others enjoy decades of life with minimal impact of the disease.2 Distinguishing between these disparate outcomes, and the many gradations in between, is difficult at an individual level. However, a review of the key established sources of heterogeneity may aid in both clinical decision making and setting goals for clinical trials.
Biologic heterogeneity
Histology
The classic histologic description of FL includes the presence of closely associated follicles with obliteration of the normal nodal architecture and loss of interfollicular space.3 The immunophenotype usually shows CD10+CD20+CD19+CD22+ cells expressing surface immunoglobulin; the cells are classically BCL2+ and BCL6+, and lack CD5 expression. The neoplastic follicles exhibit loss of polarization and consist of a mixture of centrocytes and centroblasts in various proportions.
The number of centroblasts per high-power field (hpf) is used to assess the grade. Grades 1 (< 5 centroblasts/hpf) and 2 (6-15 centroblasts/hpf) were combined in the most recent iteration of the World Health Organization classification due to poor interobserver reproducibility and the general consensus that more detailed distinction between FL grade 1 (FL1) and FL2 was clinically insignificant. FL3 is divided into FL3A, which retains centrocytes and FL3B, which consists of follicles composed of centroblasts. There is cumulative evidence that FL3B is a distinct entity, with frequent loss of t(14;18) and CD10 expression, increased p53 and MUM1/IRF4 expression, and a prominent diffuse pattern (for review, see Salaverria et al4 ) The current definition of FL3B excludes any diffuse component, which is now categorized as diffuse large B-cell lymphoma (DLBCL). Given this designation, it seems that the overall incidence of pure FL3B is quite rare.5 A large retrospective analysis of more than 500 cases with 10 years of median follow-up confirmed that the clinical outcome of FL3A is identical to that of FL1 and FL2 irrespective of anthracycline-based therapy, whereas FL3B showed no deaths or relapses beyond 5 years, like de novo DLBCL.6 Both the pattern of biologic abnormalities and the clinical behavior suggest that FL3B is a distinct entity and is more closely aligned with DLBCL than with FL1-3A.
An occasional finding in biopsy specimens is diffuse areas where the follicular structures are less prominent and are replaced by neoplastic cells that are not centroblastic. The clinical significance of “diffuse FL” depends partially on the size of the biopsy specimen and on the cell types observed. In a small specimen, areas of diffuse involvement should prompt consideration of an adjacent area of DLBCL. However, a predominance of centrocytes forming a diffuse pattern without concurrent DLBCL does not have a clear clinical consequence.
There are several variants of FL that will not be addressed in this review, which have their own expected clinical course: pediatric FL, primary intestinal FL, very early stage FL, and extranodal FL (usually of the skin).3 These variants typically lack the t(14;18) translocation and are usually BCL2 negative. If an extranodal FL variant has BCL2 expression, a search for systemic disease is warranted.
Genetic and epigenetic features associated with outcome
The hallmark diagnostic (but not prognostic) translocation in FL is t(14;18)(q32:q21), with rearrangement of the BCL2 gene in proximity to the immunoglobulin heavy chain promoter. This translocation is present in 80% to 90% of cases. It has long been recognized that this translocation is a necessary but insufficient event in FL. Considering the massive pressures encountered by normal germinal center B cells as they undergo clonal expansion, somatic hypermutation, and class switch recombination during the normal process of antigen affinity maturation process, it is perhaps not surprising that genetic and epigenetic events during this transit could lead to neoplastic transformation.
Genetic events with negative connotation include BCL6 rearrangements and MYC abnormalities. This latter aberration, MYC positivity in the setting of BCL2 positivity, has been dubbed “double hit lymphomas” and is associated with transformation to a clinically aggressive and chemorefractory disease state. First reported in the setting of transformation more than a decade ago,7 the incidence of acquiring MYC abnormalities in FL is not well delineated. Other genetic lesions associated with a poor prognosis in FL include 1p36 deletions (usually in association with mutations of TNFRSF14),8,9 TP53 mutations, MLL2 and EZH2 mutations (both involved in epigenetic modifications), and CDKN2A deletions.10 Although TP53 mutations at diagnosis are uncommon (< 5%), loss of this classic tumor suppressor gene occurs in up to 30% of transformed FL. In addition, there are several DNA gains and losses, as identified by array-based comparative genomic hybridization with prognostic significance, including negative impact of 1p36 and 6q21 deletions and gains on chromosome 17.11 MLL2 was found to be mutated in 79% of FL samples in one analysis, and given its role in histone modification, may affect epigenetic programming in FL.12
Epigenetic deregulation and mutations of genes involved in histone modification are emerging as an important component of lymphomagenesis in FL. Using array-based methylation profiling in 164 diagnostic FL samples, investigators have shown that methylation patterns are strikingly different between normal B cells and FL, with an overrepresentation of normally epigenetically repressed genes in the latter.13 The investigators noted that methylation patterns did not differ between initial and paired transformed FL cases, suggesting that epigenetic events occur early in FL.13 Others have noticed that the degree of DNA methylation variability progressively increased in DLBCL samples.14 Using phylogenetic clustering, the extent of abnormal methylation was associated with survival and DNA methylation patterning was associated with worse outcomes in FL. DNA methylation abnormalities in neoplastic B cells clustered around genes of interest, including BCL6 and EZH2. EZH2 (Y641 codon) is mutated in up to 20% of FL, and recent small molecule inhibitors have garnered clinical and potentially therapeutic interest.15 EZH2, along with several partner genes, mediates methylation and suggests a tumor suppressor role. The clinical significance of EZH2 mutations and other aberrations related to epigenetic programming in FL is emerging, but the current body of evidence suggests that methylation patterns, changes in methylation over time, and genes involved in histone and protein modification are crucial components of FL progression and prognosis.
Impact of the microenvironment
Whereas individual biologic lesions have prognostic implications, genome-wide gene expression profiling has failed to identify prognostic subsets in FL. In contrast, and perhaps underscoring its relevance, the gene expression pattern (GEP) of the surrounding microenvironment has emerged as a critical determinant of outcome.16 Nearly a decade ago, investigators at the National Cancer Institute reported 2 genetically distinct “immune signatures” based on the GEP of the nonmalignant tumor-infiltrating cells that dictated prognosis.16 Although patient samples in this study were from a pre-rituximab population, patients with an “immune-response-2” signature had a 9.35-fold increased risk of death, thus introducing a powerful prognostic tool. Others have found similar results,17,18 but there is unfortunately no reproducible nor prospectively validated assay with which to assess the microenvironment at an individual level and there are no large-scale GEP FL studies in the current era of immunotherapy-based treatments.
With the observation that the microenvironment is primarily composed of T-cell subsets (eg, FOXP3+, PD-1+) and lymphocyte-associated macrophages (eg, CD68+ LAMs), investigators have evaluated immunohistochemical surrogates of GEP evaluating the presence and extent of these cells; unfortunately, many of these reports show disparate findings between investigative groups, with some showing that LAMs are of positive prognostic value and other showing either a negative or a neutral influence on clinical outcome (for review, see Gribben et al19 ). There is good evidence that FL cells induce immune tolerance and that increased macrophages mediate angiogenesis in support of tumor growth, perhaps explaining the types of cells within the microenvironment. Some groups have investigated the prognostic and predictive value of the microenvironment in the context of specific therapies using immunohistochemical assessments; however, these studies have been difficult to interpret, with several showing a relative lack of association between cell types within the microenvironment and outcome, including survival.20-22 A recent attempt to address the prognostic relevance of FOXP3+- and PD-1+–infiltrating T cells among 264 patients enrolled in prospective trials was similarly negative; however, the investigators noted that the specific microenvironment composition shifted with stage,23 supporting the theory that the microenvironment is a dynamic place with changes in the proportion and type of T-regulatory cells, T-helper cells, and macrophages over time and perhaps underlying disease progression. Newer therapies, including lenalidomide, may work in part by modifying the interaction between malignant and nonmalignant cells.24
Overall, it is clear that the microenvironment is an important potential contributor of FL clinical behavior; the reliance of FL on surrounding nonmalignant cells is further emphasized by the inability to develop cell lines without the microenvironment or supporting cytokines. However, the composition of the microenvironment is likely influenced by stage, number and types of therapies and biologic pressures related to transformation. At present, there is no validated means of integrating information about the microenvironment into clinical risk stratification.
Clinical heterogeneity
Clinical prognostic indices
Although data suggest that there is significant biologic heterogeneity in FL at the histologic, genetic, epigenetic, and proteomic levels, tools have yet to be developed to inform clinical decision making other than the identification of an FL3B subgroup that appears to benefit from the use of anthracycline-based therapy. Instead, clinicians must rely on clinical data in making management decisions. The International Prognostic Index, originally developed for aggressive lymphomas, identifies patients with the highest risk; however, these patients represent a minority of FL patients and lower risk disease is not further stratified. The Follicular Lymphoma International Prognostic Index (FLIPI) was developed to address this issue.25 Reviewing the records of more than 4000 patients diagnosed in the mid-1980s to 1990s, multivariate analysis identified 5 variables that were subsequently validated in another 919 patients (Table 1). The primary end point of the analysis was OS, and the 3 risk groups (low risk = 0-1 factors, intermediate risk = 2 factors, and high risk ≥ 3 factors) nicely stratified patients with 10-year survival of 71%, 51%, and 36%, respectively. Given the long observation period required to assess OS, the FLIPI-2 prospectively analyzed patients treated with chemoimmunotherapy and used progression-free survival (PFS) as the primary end point (Table 1).26 The key components of the FLIPI-2 include age ≥ 60 years, elevated β2-microglobulin, hemoglobin ≤ 12g/dL, BM involvement, and longitudinal diameter of a single lymph node ≥6 cm. Although the FLIPI-2 is a robust tool, β2-microglobulin is not routinely obtained, particularly in the United States, and the original FLIPI is more commonly used. Size of the largest lymph node mass is an important consideration, both as a risk factor for transformation and as a potential factor in response to specific therapies. In general, bulky disease portends for a decreased response. Bendamustine, for example, has substantial activity in both frontline27 and relapsed FL,28,29 with response rates approximating 75%; however, patients with bulky disease (defined as ≥ 10 cm) have an inferior response rate (50% vs 80%) and shorter response duration.28 Newer agents challenge the utility of tumor bulk as a predictor of inferior outcome; for example, the combination of lenalidomide and rituximab in treatment-naive indolent lymphomas has similar activity in patients regardless of FLIPI score, GELF criteria, or bulky disease.30 These findings have prompted an international multicenter trial comparing chemoimmunotherapy and lenalidomide/rituximab directly in frontline FL.
The FLIPI has been repeatedly validated in clinical trial settings, including immunotherapy and chemoimmunotherapy regimens.31,32 The baseline FLIPI appears to retain prognostic value irrespective of response, and some studies have found that high-risk FLIPI patients had an inferior outcome even if a response had been achieved.31,33 The first large-scale validation of the FLIPI outside of a clinical trial setting was recently reported by the National Lymphocare Study.34 In this report, approximately 2200 patients from community and academic sites were found to almost evenly fall into low-risk (35%), intermediate-risk (30%), and high-risk (35%) FLIPI groups, with 2-year PFS at 84%, 72%, and 65%, respectively. Although the median OS had not been reached in any group (median follow-up, 57 months), the FLIPI identified differences in outcome irrespective of rituximab-based versus chemoimmunotherapy-based treatment in this registry evaluation, further validating that FLIPI is applicable in routine clinical risk stratification and supporting its incorporation into patient-level discussions.
Assessment of “tumor burden”
Despite the utility of FLIPI for prognosis, the decision to initiate treatment in FL patients is based on the assessment of tumor burden and associated symptoms. There are several definitions of high-tumor-burden FL, including the Groupe d'Etude des Lymphomes Folliculaires (GELF), British National Lymphoma Investigation (BNLI), and National Comprehensive Cancer Network (NCCN) criteria which define high tumor burden with minor variations (Table 2). Given the propensity to cause symptoms and disease progression, high-tumor-burden FL patients are often offered chemoimmunotherapy (and not immunotherapy alone) on a somewhat urgent basis. Standard options include R-CHOP (rituximab plus cyclophosphamide, doxorubicin, vincristine, prednisone/prednisolone), rituximab plus bendamustine, or R-CVP (rituximab plus cyclophosphamide, vincristine, prednisone).35 Fludarabine-based regimens are associated with increased short- and long-term toxicity and are generally reserved for relapsed settings.36,37
For low-tumor-burden patients, who by definition do not meet the criteria for treatment initiation, the usual lack of symptoms prompts a period of observation that can range from months to years, with a median time to progression of approximately 3 years in trial settings.38 Some have questioned whether the paradigm of “watchful waiting” for low-tumor-burden patients remains relevant in the modern era, particularly because rituximab therapy prolongs PFS and, to a lesser extent, time to chemotherapy initiation in this low-risk population.34,38,39 However, a retrospective analysis of 120 treatment-naive low-tumor-burden FL patients among more than 1000 patients enrolled in a population-based French registry underwent a watchful waiting strategy.39 The median time to treatment initiation in this group was 55 months, and their outcome was comparable to 242 patients with similar features receiving frontline rituximab-based therapy. The investigators concluded that, even in the current era, low-tumor-burden FL patients can be managed expectantly without detrimental impact and that the overall prognosis is quite good.
Several problems regarding clinical prognostic tools remain. First, there is increasing awareness that the extremes of age may not be well represented in existing models. Second, there is a paucity of data regarding the impact of epidemiologic variations on outcome, including race and socioeconomic status.40 In addition, although the impact of sex did not emerge in either of the FLIPI analyses, the prospectively conducted Primary Rituximab and Maintenance (PRIMA) trial found that females on maintenance rituximab had a longer PFS (hazard ratio = 0.76; P = .013) than males,37 raising the need for continued study of epidemiologic factors affecting FL outcome.
Response to therapy as a means of risk stratification
Once a patient starts treatment, there are fewer data regarding risk stratification. Intuitively, patients with a complete remission have better outcomes than those with partial remissions, stable disease, or progressive disease. A recently reported study of 536 patients enrolled onto French and Belgian trials found that a complete response to initial therapy was associated with an improved OS and an improved survival after first relapse41 ; the robust median follow-up time of 14.9 years is noteworthy and the investigators suggest that achievement of complete remission should be a primary goal of current and future trials. There are several caveats to the study, including the era in which patients were treated (1980s-1990s) and the possibility that depth of response is simply a surrogate for better disease biology. Nevertheless, because OS as a clinical trial end point in FL is not practical, long-term follow-up data for these types of trials are helpful. Depth of response can also be assessed by molecular assays for Bcl2/IgH transcripts via PCR in the blood or BM, reflecting minimal residual disease (MRD) monitoring. Numerous trials have found that achieving a PCR-negative state is associated with improved PFS and have therefore used this as a marker of efficacy.42 Although MRD assessment has been widely evaluated in clinical trials, a recent report (albeit in a relapsed setting) suggests that maintenance and/or immunotherapy strategies attenuate the value of MRD monitoring in FL.43 For now, assessment via PCR for MRD is likely relegated to clinical trial settings in FL.
Depth of response can also be evaluated via functional imaging, with several trials showing a prognostic value for end-of-treatment PET scans. A French trial evaluated 121 high-tumor-burden FL patients treated with R-CHOP with baseline, interim, and end-of-treatment PET scans. Both interim PET and end-of-treatment PET were associated with PFS and OS. With a median follow-up of 23 months, 2-year PFS for negative and positive interim PET was 86% versus 61%, respectively. The 2-year PFS for negative and positive end-of-treatment PET was 87% versus 51%, respectively. The 2-year OS for negative and positive for end-of-treatment PET was 100% versus 88%, respectively.44 A subanalysis of the international PRIMA trial had similar findings, although end-of-treatment functional imaging data were only available for 122 of the more than 1200 patients enrolled.45 Nevertheless, baseline patient characteristics of this subset were similar to the larger group (thus permitting extrapolation) and, with a median follow-up of 42 months, PFS was 71% (negative PET) versus 33% (positive PET). Although PFS varied by the induction regimen (R-CHOP vs R-CVP), the investigators proposed that a positive end-of-treatment PET portends an aggressive disease because the median PFS, even in R-CHOP–treated patients, was only 22 months. While interesting, it should be noted that routine PET at the end of treatment in FL is not currently recommended by published guidelines.46,47
Risk stratification in the relapsed setting
Outside of the frontline setting, there are no validated prognostic tools, and patients fall into several clinically distinct scenarios (Figure 1). Prognosis is instead based on a clinical synthesis of several factors, including number of prior regimens, assessment of kinetic failure, and progressive decline of BM reserve. Patients with a long response duration from their frontline treatment and a lack of symptoms are often observed even with radiographic progression, making the relevance of PFS results in clinical trials challenging in practice. In terms of decision making, patients with a long time from their last therapy might retain an excellent prognosis and could conceivably be retreated with similar agents as in the frontline setting based on assessment of chemosensitivity. On a positive note, the number of prior regimens is increasingly less useful in light of today's armamentarium of biologic and targeted agents. The old paradigm is that increasing number of regimens is a surrogate for outcome, with duration of response progressively decreasing with each line of therapy.48 This iconic study found a 4.5-year median survival after relapse. However, this has changed in the current era of targeted and biologic agents, when response to a second or third treatment may actually exceed prior response durations. Therefore, the number of prior regimens in and of itself is a poor predictor of outcome.
Instead of assessing the number of prior regimens, a more useful designation might be “rituximab-refractory,” which has crept into the eligibility criteria of numerous recent and ongoing trials. This is variably defined as lack of response or progression on rituximab monotherapy or rituximab-containing regimens or a time to progression of less than 6 months from rituximab-based therapy. Regardless of the precise definition, it is clear that as frontline management options improve, patients with relapsed disease or short response duration to initial treatment have a worse outcome and the bar for new agents is higher than in previous times. Examples include newer monoclonal antibodies that are promising in relapsed settings but cannot overcome rituximab-refractory states as single agents.
Risk stratification in the relapsed setting has an unmet need for hard data to aid the clinician and to standardize evaluation of published clinical trial results. Many of the genetic and epigenetic features associated with aggressive disease increase with sequential relapses, but there is no clear way to integrate this information into clinical settings.
Assessing risk of transformation
Transformed FL (TFL) is variably defined but clinically refers to a shift from an indolent to a more aggressive lymphoma, with histologic evidence of either DLBCL or other high-grade morphology. A biopsy is not always feasible. TFL can also be a clinical diagnosis, with rapid disease progression, elevated lactate dehydrogenase, and new onset of B symptoms being some of the prominent manifestations.49,50 In a Canadian study, there was no difference in median survival between patients with a clinical versus biopsy-proven diagnosis of TFL.49
The overall risk of TFL is approximately 3% per year, with a lifetime cumulative risk of 30% for patients with FL in excess of 10 to 15 years. Predicting risk of transformation in individual patients is difficult, although biologically there is increased acquisition of unfavorable genetic and epigenetic events (BCL6 rearrangements, TP53 loss, del 1p36, DNA copy number alterations),51-53 which, if present, raise suspicion and at the very least warrant closer observation. Among 170 patients with TFL, the Canadian group found only advanced stage to be predictive, although all patients were from a pre-rituximab era. Similarly, a United Kingdom study found that, among 88 TFL patients (of 325 FL patients), advanced-stage, high FLIPI scores and high IPI scores were the most predictive of transformation; these investigators also found that patients who were initially observed had a higher rate of transformation than patients who were treated.50 However, many others have seen no difference in TFL risk based on initial observation versus treatment and this issue remains unresolved in the current treatment era.49
Summary
FL remains a clinical challenge, with several parameters influencing the overall disease course. A major challenge with risk stratification in FL is that many of the above-mentioned biologic tools are generally not validated in prospective settings and are not yet commercially available. Furthermore, there is no instrument addressing patient-specific, disease-specific, and treatment-specific factors in a comprehensive manner. As newer therapies emerge, older prognostic factors may become less relevant, and this review does not discuss predictive factors in the context of specific therapies. For now, risk stratification at an individual level relies heavily on clinical and treatment-related features, although the promise of more personalized and biologically based tools is on the horizon.
Disclosures
Conflict-of-interest disclosure: The author has consulted for Micromet, Seattle Genetics, Celgene, Allos, Genentech, and Onyx. Off-label drug use: Discussion of new agents that are not approved (including but not limited to BTK inhibitors and PI3K inhibitors) and approved agents in unapproved indications (including but not limited to lenalidomide).
Correspondence
Sonali M. Smith, MD, Associate Professor, Section of Hematology/Oncology, Department of Medicine, The University of Chicago, 5841 S Maryland Avenue MC2115, Chicago, IL 60637; Phone: 773-702-4400; Fax: 773-702-0963; e-mail: smsmith@medicine.bsd.uchicago.edu.