Skip to Main Content

Skip Nav Destination

Brief Reports| January 3, 2013

Incremental value in outcome prediction with gene expression–based signatures in diffuse large B-cell lymphoma

Fangxin Hong,

Fangxin Hong

1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA; and

Search for other works by this author on:

PubMed

Google Scholar

Brad S. Kahl,

Brad S. Kahl

2School of Medicine and Public Health, University of Wisconsin, Madison, WI

Search for other works by this author on:

PubMed

Google Scholar

Robert Gray

Robert Gray

1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA; and

Search for other works by this author on:

PubMed

Google Scholar

Blood (2013) 121 (1): 156–158.

https://doi.org/10.1182/blood-2012-08-450106

Abstract

Multiple gene expression–based signatures have been identified in diffuse large B-cell lymphoma that are predictive for survival outcomes. Most studies assess predictive significance based on P values from multivariable Cox regression. Few investigations have evaluated the incremental usefulness of these signatures. Recent developments in statistical methodology extend the use of concordance measures on censored survival data. We applied these methods to evaluate the added value in survival risk prediction from 3 published gene-based signatures on 2 sets of patients with diffuse large B-cell lymphoma treated with CHOP or R-CHOP. Our results indicate these gene-based signatures are inferior to clinical factors and provide little added value in risk assessment. To develop highly discriminating risk prediction models, we need to use appropriate approaches and consider more than gene expression. However, the study of gene expression and clinical outcomes retains considerable potential to enhance understanding of disease mechanisms and uncover new therapeutic targets.

Key Points

Gene-based predictors have good discrimination ability; the IPI remains the most powerful predictor of clinical outcome in DLBCL.
One needs to use appropriate approaches and consider more than gene expression to develop highly effective risk prediction models.

Introduction

Risk prediction procedures are valuable tools for cancer management, and risk scoring systems have been established for assessing individual risks in survival outcome for various cancer type. In diffuse large B-cell lymphoma (DLBCL), 5 independent clinical characteristics (age, Ann Arbor stage, serum lactate dehydrogenase, performance status, and number of extranodal sites) have been used in international prognostic index (IPI) to predict outcomes in current clinical practice. Several gene expression–based molecular signatures have been developed for clinical risk prediction. The lymphoma/leukemia molecular profiling project reported a 17-gene predictor,¹ and a 3-component signature (∼ 400 genes).² Lossos et al built a predictive model based on 6 genes.³ Alizadeh et al further simplify to a 2-gene model as risk predictor.⁴ All models were claimed to be independent of the IPI and add to its predictive power. However, all studies assess predictive significance based on P value from multivariable Cox regression that provides little knowledge of the added value in individual risk prediction. Despite the discussion of DLBCL risk predictors in the statistical literature,^5,6 a key component in assessing risk prediction is its ability to distinguish subjects who will develop an event (progression or death) from those who will not, by a specified time. This concept, known as discrimination, has been well quantified for binary outcomes by concordance measures such as the area under the receive operating characteristics curve, also referred as “C-statistic.” Various concordance measures have been extended to censored survival data in the statistical literature^7-9 and are now being used in clinical setting to assess the prediction usefulness of biomarkers.^10,11 In this report, we assess the usefulness of 3 published gene-based risk signatures compared with known clinical prognostic factors; with the goal of investigating the added value

Methods

The first gene-expression risk signature is a 6-gene predictor described as (−0.0273 × LMO2) + (−0.2103 × BCL6) + (−0.1878 × FN1) + (0.0346 × CCND2)+(0.1888 × SCYA3) + (0.5527 × BCL2).^3,12 The second gene-expression risk signature is a 3-component signature reported as (−0.419 × germinal center B-cell avg) + (−1.015 × stromal-1 avg) + (0.675 × stromal-2 avg),² where avg is the mean expression of all genes in the given group. There are 39 genes in germinal center B-cell group, 283 genes in stromal-1 group, and 72 genes in stromal-2 group. The third one was recently published as (−0.0323 × LMO2) + (−0.29 × TNFRSF9).⁴ Two datasets used were introduced by Lenz et al,² where pretreatment tumor-biopsy specimens and clinical data were obtained from 233 and 181 patients with newly diagnosed DLBCL treated with RCHOP or CHOP regimen. The event rate is 25.8% and 58.0%, with median follow-ups of 2.81 and 7.62 years, 2.93 and 7.20 years among living patients, respectively. Datasets were downloaded from GEO as normalized expression levels (GSE10846) and log₂ transformed. Two concordance measures were used. The C-statistic, the estimated concordance between prediction and observation (event vs nonevent)—the probability that predicted risk score is higher for subjects with earlier times of event, provides a global measure of a fitted survival model for the continuous event time rather than at a particular follow up time. The integrated discrimination improvement (IDI) measures overall improvement in sensitivity and specificity, roughly, the sum of increased risk score for events and reduced risk score for nonevents.^13,14 Mathematically, the IDI is the reduction in R² for the Cox model, with a 0 to 1 range. Multivariable Cox regression was used with 2 sets of prespecified models, 1 model with only clinical prognostic factors or only gene-based predictor, and 1 model with both clinical factors and molecular predictor. The discriminatory capability of the 2 models was evaluated by C-statistic; the improvement was assessed by the difference in C-statistic and IDI. The survival duration for evaluation was 3 and 5 years for the RCHOP and CHOP datasets. An unbiased estimator for the C-statistic (R package, SurvC1),⁸ which is robust with respect to the choice of evaluation time, was used. All variables are considered continuous. Because the CHOP and RCHOP datasets were used to build the 3-component signature² and the 2-gene model,⁴ respectively, they were not used to evaluate that predictor.

Results and discussion

All gene-based predictors are significantly associated with survival outcomes with P < .001 when used alone, and they remain significant with P < .001 after adjusting for all clinical prognostic factors in multivariable Cox model. In RCHOP validation dataset, the C-statistic was 0.600 and 0.717 for 6-gene predictor and 3-component signature, suggesting good discrimination ability when used alone. However, the performance is inferior to the known clinical factors with a C-statistic of 0.739. When added to clinical factors, the C-statistic was increased to 0.752 and 0.771, showing improvement of 0.013 (95% confidence interval [CI], −0.021 to 0.047) and 0.031 (95% CI, −0.026 to 0.089) for 6-gene predictor and 3-component signature (Table 1), respectively. Further assessment by IDI reveals an added value of 0.001 (95% CI, −0.008 to 0.049) and 0.076 (95% CI, 0.013 to 0.167) for the 2 predictors. Similar trends were observed in validation with the CHOP dataset (Table 1). The C-statistic was 0.678 and 0.619 for the 6-gene predictor and 2-gene models and was 0.721 for the known clinical factors when used alone. The improvement when added to clinical factors were 0.002 (95% CI, −0.021 to 0.026) and 0.018 (95% CI, −0.016 to 0.052) in C-statistics, and 0.022 (95% CI, −0.006 to 0.080) and 0.037 (95% CI, −0.002 to 0.098) assessed by IDI. The improvement was small and statistically significant only for the 3-component model in the RCHOP dataset. In contrast, clinical factors improve risk prediction significantly, for example, improvement of 0.146 (95% CI, 0.064 to 0.227) and 0.120 (95% CI, 0.051 to 0.189) in C-statistic, when added to 6-gene predictor and 2-gene models in the CHOP dataset.

Table 1

Incremental values of the molecular predictors when added to the clinical factors

Risk factor	RCHOP			CHOP
Risk factor	C-statistic	Difference in C (95% CI)	IDI	C-statistic	Difference in C (95% CI)	IDI
Clinical factors	0.739			0.721
Clinical factors + 6-gene predictor	0.752 (0.600)*	0.013 (−0.021, 0.047)	0.001 (−0.008, 0.049)	0.724 (0.678)*	0.002 (−0.021, 0.026)	0.022 (−0.006, 0.080)
Clinical factors + 3-component signature	0.771 (0.717)*	0.031 (−0.026, 0.089)	0.076 (0.013, 0.167)
Clinical factors + 2-gene model				0.739 (0.619)*	0.018 (−0.016, 0.052)	0.037 (−0.002, 0.098)

Risk factor	RCHOP			CHOP
Risk factor	C-statistic	Difference in C (95% CI)	IDI	C-statistic	Difference in C (95% CI)	IDI
Clinical factors	0.739			0.721
Clinical factors + 6-gene predictor	0.752 (0.600)*	0.013 (−0.021, 0.047)	0.001 (−0.008, 0.049)	0.724 (0.678)*	0.002 (−0.021, 0.026)	0.022 (−0.006, 0.080)
Clinical factors + 3-component signature	0.771 (0.717)*	0.031 (−0.026, 0.089)	0.076 (0.013, 0.167)
Clinical factors + 2-gene model				0.739 (0.619)*	0.018 (−0.016, 0.052)	0.037 (−0.002, 0.098)

*

C-statistic when molecular predictor was used alone.

Survival risk scores derived from the multivariable Cox model were used to rank cases that were then divided into quartile groups. Figure 1 shows the Kaplan-Meier curves of survival probabilities for 4 groups of patients using risk scores derived from model with clinical factors alone, and clinical factors + molecular predictors. Further investigation on subject-specific incremental value suggests gene-based biomarkers improve risk prediction only for patients with intermediate risk, and not for patients with high or low risk.

Figure 1

Figure 1. Overall survival in RCHOP and CHOP datasets. Overall survival in RCHOP dataset. (Ai-iii) and in CHOP dataset (Bi-iii) for 4 quartile groups using clinical factors only (Ai,Bi), and clinical factors + molecular predictors (Aii-iii, Bii-iii). Survival risk scores derived from the given multivariable Cox model were used to rank cases that were then divided into quartile groups.

View large Download PPT

Overall survival in RCHOP and CHOP datasets. Overall survival in RCHOP dataset. (Ai-iii) and in CHOP dataset (Bi-iii) for 4 quartile groups using clinical factors only (Ai,Bi), and clinical factors + molecular predictors (Aii-iii, Bii-iii). Survival risk scores derived from the given multivariable Cox model were used to rank cases that were then divided into quartile groups.

Although gene-based predictors have good discrimination ability, when used alone, the IPI remains the most powerful predictor of clinical outcome in patients with DLBCL. The improvement with the addition of gene-based predictors is not statistically significant in most cases evaluated by the C-statistic and IDI measures. Although patients with intermediate risk by IPI, might benefit from additional testing, the clinical utility of the 3 predictors is questionable. P values from Cox models, although testing whether there is an association with outcome, do not measure the separation (discrimination) in predictor scores between patients with and without events. Improvement and refinement would be achieved with use of appropriate methods such as concordance measures. Toward the goal of risk assessment, we are not intending to compare all approaches, but to highlight the need to move forward with more appropriate methods to derive and evaluate predictors, and consider more than gene expression to develop substantially more effective predictors. However, the study of gene expression and clinical outcomes retains its importance in understanding disease mechanism and developing new therapeutic strategies.

Presented in abstract form at the 53rd Annual Meeting of the American Society of Hematology, San Diego, CA, December 10, 2011.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Acknowledgments

The authors thank Dr Donna Neuberg for constructive suggestions and review of the manuscript and 2 reviewers for insightful criticism.

This study was funded by the Department of Biostatistics and Computational Biology R.S. fund.

Authorship

Contribution: F.H. designed the study, analyzed data, and wrote the manuscript; B.S.K. and R.G. provided critical suggestions and evaluated and edited the manuscript; and all coauthors subsequently collaborated on completing the article.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Fangxin Hong, Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, MA 02115; e-mail: fxhong@jimmy.harvard.edu

References

1

Rosenwald

A

,

Wright

G

,

Chan

WC

, et al. ,

The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma.

,

N Engl J Med

,

2002

, vol.

346

25

(pg.

1937

-

1947

)

2

Lenz

G

,

Wright

G

,

Dave

SS

, et al. ,

Stromal gene signatures in large-B-cell lymphomas.

,

N Engl J Med

,

2008

, vol.

359

22

(pg.

2313

-

2323

)

3

Lossos

IS

,

Czerwinski

DK

,

Alizadeh

AA

, et al. ,

Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes.

,

N Engl J Med

,

2004

, vol.

350

18

(pg.

1828

-

1837

)

4

Alizadeh

AA

,

Gentles

AJ

,

Alencar

AJ

, et al. ,

Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment.

,

Blood

,

2011

, vol.

118

5

(pg.

1350

-

1358

)

5

Schumacher

M

,

Binder

H

,

Gerds

T

. ,

Assessment of survival prediction models based on microarray data.

,

Bioinformatics

,

2007

, vol.

23

14

(pg.

1768

-

1774

)

6

Segal

MR

. ,

Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited.

,

Biostatistics

,

2006

, vol.

7

2

(pg.

268

-

285

)

7

Chambless

LE

,

Cummiskey

CP

,

Cui

G

. ,

Several methods to assess improvement in risk prediction models: extension to survival analysis.

,

Stat Med

,

2011

, vol.

30

1

(pg.

22

-

38

)

8

Uno

H

,

Cai

T

,

Pencina

MJ

, et al. ,

On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data.

,

Stat Med

,

2011

, vol.

30

10

(pg.

1105

-

1117

)

9

Uno

H

,

Cai

T

,

Tian

L

, et al. ,

Graphical procedures for evaluating overall and subject-specific incremental values from new predictors with censored event time data.

,

Biometrics

,

2011

, vol.

67

4

(pg.

1389

-

1396

)

10

Wang

TJ

,

Gona

P

,

Larson

MG

, et al. ,

Multiple biomarkers for the prediction of first major cardiovascular events and death.

,

N Engl J Med

,

2006

, vol.

355

25

(pg.

2631

-

2639

)

11

Meigs

JB

,

Shrader

P

,

Sullivan

LM

, et al. ,

Genotype score in addition to common risk factors for prediction of type 2 diabetes.

,

N Engl J Med

,

2008

, vol.

359

21

(pg.

2208

-

2219

)

12

Alizadeh

AA

,

Gentles

AJ

,

Lossos

IS

, et al. ,

Molecular outcome prediction in diffuse large-B-cell lymphoma.

,

N Engl J Med

,

2009

, vol.

360

26

(pg.

2794

-

2795

)

13

Pencina

MJ

,

D'Agostino

RB

,

D'Agostino

RB

, et al. ,

Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond.

,

Stat Med

,

2008

, vol.

27

2

(pg.

157

-

172

)

discussion 207-212

14

Uno

H

,

Tian

L

,

Cai

T

, et al. ,

A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data [published online ahead of print October 5, 2012].

,

Stat Med

doi:10.1002/sim.5647

© 2013 by The American Society of Hematology

2013

Sign in via your Institution