Key Points
We evaluated a cohort of ARL samples for known and novel tumor viruses, revealing the oncogenic herpesvirus EBV as the sole detected infectious agent.
Heterogeneous viral gene expression suggests that variable host immunosurveillance of EBV latency may influence lymphomagenesis.
Abstract
Immunodeficiency dramatically increases susceptibility to cancer as a result of reduced immune surveillance and enhanced opportunities for virus-mediated oncogenesis. Although AIDS-related lymphomas (ARLs) are frequently associated with known oncogenic viruses, many cases contain no known transforming virus. To discover novel transforming viruses, we profiled a set of ARL samples using whole transcriptome sequencing. We determined that Epstein-Barr virus (EBV) was the only virus detected in the tumor samples of this cohort, suggesting that if unidentified pathogens exist in this disease, they are present in <10% of cases or undetectable by our methods. To evaluate the role of EBV in ARL pathogenesis, we analyzed viral gene expression and found highly heterogeneous patterns of viral transcription across samples. We also found significant heterogeneity of viral antigen expression across a large cohort, with many patient samples presenting with restricted type I viral latency, indicating that EBV latency proteins are under increased immunosurveillance in the post-combined antiretroviral therapies era. Furthermore, EBV infection of lymphoma cells in HIV-positive individuals was associated with a distinct host gene expression program. These findings provide insight into the joint host-virus regulatory network of primary ARL tumor samples and expand our understanding of virus-associated oncogenesis. Our findings may also have therapeutic implications, as treatment may be personalized to target specific viral and virus-associated host processes that are only present in a subset of patients.
Introduction
Evidence that viral infection could lead to the development of cancer came early in the 20th century, and many associations between viruses and hematologic malignancies have been identified in a variety of animal models. It has been estimated that close to 12% of all cancers in humans worldwide are caused by viral infection.1 However, only a few viruses have been shown to genuinely cause cancer in humans, and only 3 have been shown to directly cause lymphomas: Epstein-Barr virus (EBV; formally designated human herpesvirus 4), Kaposi sarcoma-associated herpesvirus (KSHV; human herpesvirus 8), and human T-cell lymphotropic virus 1.
Immunodeficient individuals have greatly increased cancer incidence, the pathogenesis of which is thought to be related to disrupted immune surveillance, chronic antigenic stimulation, genetic alterations, cytokine dysregulation, and viral infection.2-4 The viral contribution to these malignancies has been extensively studied, and 2 tumor viruses have been discovered after a targeted search based on the epidemiologic association of specific cancer types and AIDS: KSHV and Merkel cell polyomavirus.5,6 These previous findings have raised the likelihood of novel tumor viruses being discovered in the context of immunosuppression.
Improved combined antiretroviral therapies (CART) as HIV treatment options have resulted in decreased immunosuppression and concomitantly reduced incidence of the highly immunogenic virus-associated malignancies, such as Kaposi sarcoma.7-10 As a result, the proportion of non-Hodgkin lymphomas has increased as a fraction of the total number of AIDS-related cancers.11 Non-Hodgkin lymphomas constitute >50% of all AIDS-defining cancers in developed countries and are the most common cause of cancer-related death in HIV-infected individuals.12 AIDS-related lymphomas (ARLs) are phenotypically and histologically similar to lymphomas of the immunocompetent; however, ARL cases are more frequently associated with known virus mediators of oncogenesis.13-15 For example, although EBV is present in ∼30% of all ARLs, it is more common among diffuse large B-cell lymphomas (DLBCLs) with immunoblastic morphology (>80%) than those with centroblastic morphology (20-40%), and it is present in the vast majority of DLBCLs with presentation in the central nervous system.16 In addition, ∼5% of ARLs are associated with KSHV infection. However, the remaining ∼60% of ARLs have no known viral contribution to lymphomagenesis. Given that non-Hodgkin lymphoma is increased ∼15-fold in the context of AIDS,17 this implies that >55% of ARLs have an etiology directly related to the immunosuppressed host and cannot be ascribed to any known virus.
The permissive environment of the moderately immunosuppressed host and lack of known oncoviruses in a majority of cases motivated us to reexamine the viral and cellular transcriptomes of ARLs, in search of novel transforming viruses.
Methods
Patient samples
Seventy-two formalin-fixed paraffin-embedded (FFPE) ARL samples were collected from New York Presbyterian Hospital–Weill Cornell, the University of Siena, and the AIDS Malignancy Consortium. Cases were included if >80% tumor cells were present and the diagnosis of B-cell lymphoma was confirmed. All samples were obtained with the approval of the institutional review boards at both institutions. Research was conducted in accordance with the Declaration of Helsinki. Sixteen cases with frozen tissue were collected from New York Presbyterian Hospital–Weill Cornell and confirmed to contain >80% tumor cells. The 16 frozen and 9 of the FFPE samples were subjected to RNA extraction and transcriptome sequencing. Independent validation cases shown in Table 2 and Figure 2 were collected from the Hematopathology Laboratory of Weill Cornell Medical College Pathology/The New York Presbyterian Hospital (40 cases), the AIDS Malignancy Consortium (12 cases), The University of Siena (20 cases), and the AIDS Cancer Specimen Resource (140 cases). Tissue diagnosis was made with the use of criteria from the World Health Organization.18 In cases of DLBCL, germinal center B-cell (GCB) vs non-GCB subtype was determined by the Hans algorithm.19,20 Our study of 25 ARL samples is well powered to contain ≥1 sample with a virus if that virus occurs in >10% of total ARL cases (statistical power >0.9).
Histology
EBV Probe ISH Kit (Leica Microsystems, Wetzlar, Germany; Vision BioSystems Novocastra, Newcastle-upon-Tyne, UK) was used for in situ hybridization (ISH) for EBV RNA (EBER). Cases were considered positive when >20% neoplastic cells were immune reactive, except for BCL-2, where cases were considered positive when >50% of tumor cells had moderate to strong positivity. Immunohistochemical (IHC) studies were performed using monoclonal antibodies to CD-10, BCL-2, BCL-6, MUM-1 (3H2E8; Santa Cruz Biotech, Santa Cruz, CA), and Ki-67 (MIB-1; DakoCytomation, Carpenteria, CA). Nuclear Ki-67 expression was semiquantitative and assessed as the percentage of positive tumor cells. For EBV latency assessment, IHC was done with antibodies to LMP1 (clones CS1-4, Abcam; and clone OT21C, kind gift from Jaap Middeldorp from the VU University Medical Center), EBNA2 (clone PE2; Dako), and LMP2A (15F9; Abcam).
Sequencing and analysis
Sequenced cDNA libraries were constructed from 25 cases using random hexamer primers, ligated to Illumina adaptors, and subjected to high-throughput Illumina sequencing to generate 17 to 150 million 76- to 100-bp paired-end sequence reads per sample. Reads were aligned to the EBV (human herpesvirus 4; GenBank accession no. NC_007605) and human (UCSC hg19) genomes using Bowtie2 with the default “sensitive” parameters for alignment.21 This approach allows multiple read alignment mismatches, which increases coverage of diverse EBV genotypes. We also aligned using Burrows-Wheeler Aligner22 (both default and increased sensitivity parameters) and obtained similar results (supplemental Table 1, available on the Blood Web site). Reads that mapped to the viral genome with low alignment score or to the human genome were considered of ambiguous origin and discarded, as previously described.23
To detect other known and novel viruses, the PathSeq pipeline was used, as previously described.24 Briefly, reads were aligned against all known viral genomes, as downloaded from the National Center for Biotechnology Information, using Bowtie2. To mitigate monoclonal reads inflating counts, we compress all reads that align to a single position in the genome to a single read. This provides more reliable evidence of presence/absence of a given virus or gene transcript but may decrease the dynamic range of estimated gene expression for highly expressed transcripts. In search of novel pathogens, high-quality unalignable reads were formed into contigs using Trinity.25
Viral RNA gene expression values were estimated using cufflinks as reads per kilobase per million reads aligned (RPKM) across all known transcriptional units, as previously described.23 Cufflinks estimates overlapping transcript abundance by the read coverage of unique portions of each transcript. Track plots show read overlap counts normalized to total reads aligning to the viral genome. Normalized heatmaps show RPKM divided by −log(f), where f is the fraction of reads uniquely aligning to the viral genome.
Human RNA gene expression was estimated using Cufflinks.23 Human genes without significant expression (ie, those genes with RPKM <1) were removed from subsequent analyses, resulting in ∼8000 genes for analyses. Principal component analysis clustering identified histological subtype (DLBCL and Burkitt lymphoma [BL]) and sample preparation method (fresh frozen or archival FFPE) as covariates (supplemental Figure 5A-B). We thus used cuffdiff to analyze differential gene expression by subtype and by sample preparation method. Consistency of gene expression signatures was determined through overlap of significant differentially expressed genes (Figure 4C), scatter plot correlation (supplemental Figure 5C), and most differentially expressed rank overlap (supplemental Figure 5D). Overlap statistical enrichment was estimated using the hypergeometric distribution and implemented as a Fisher exact test. Human genome-wide gene expression was summarized by principal component analysis on genes with variance >0.01.
Results
AIDS-related lymphoma cohort description, histology, and sample selection
To understand the viral landscape of ARL and determine whether previously unidentified oncogenic pathogens are present in this disease, we established a tissue sample cohort (n = 88 distinct patients) that could be assayed for viral RNA. We selected 25 cases with tissue blocks available for accurate histological subclassification by World Health Organization criteria, which included 19 DLBCLs, 3 BLs, 2 follicular lymphomas, and 1 plasmablastic lymphoma (PBL) (Table 1). Six EBV-positive cases, as determined by EBER ISH, were sequenced and analyzed blindly as a way to assess our ability to detect this virus. Due to known double viral infection in ARL (eg, EBV and KSHV in primary effusion lymphomas), we also analyzed the EBV+ cases for novel pathogens. We excluded KSHV-positive cases from our cohort.
PathSeq identifies EBV as the only virus present in ARL samples
To identify the viral populations in ARL samples, we performed massively parallel whole transcriptome sequencing of total RNA, enabling detection of non–poly-adenylated and other RNA species in addition to coding transcripts. These data were then analyzed using the PathSeq pipeline, which subtracts all reads that could potentially align to the human genome, aligns remaining reads to all known viruses, and assembles unaligned reads into contigs24 (Figure 1A).
To assess specificity and sensitivity of our assay, we performed a double-blind control where we analyzed RNA-seq data of samples that were positive and negative for EBV. We determined a simple read cutoff statistic, which was able to identify EBV in all samples that tested positive by EBER RNA ISH while maintaining perfect specificity (Figure 1B). The samples that tested negative for EBER but had reads that aligned to EBER and other viral genes (supplemental Figure 1A) were found to contain tumor infiltrating EBV-infected lymphocytes (Figure 1C). This confirmed that PathSeq coupled with total RNA sequencing is a sensitive and specific assay for presence of viral RNA.
We next determined whether other known viruses were present in our cohort. We identified a subset of samples that contained reads aligning to genic regions of known human adenovirus C and polyomaviruses (supplemental Table 2). Based on previous experience and reports, we tested the sequence libraries for contamination.26,27 Subsequent polymerase chain reaction (PCR) analysis of cDNA sequencing libraries and DNA re-extracted from the original tumor samples revealed that neither adenovirus nor polyomavirus was present in our tumor samples. Namely, the viral-aligning reads were assembled into >100-nt contigs that could be efficiently amplified by PCR. Gel electrophoresis revealed that the sequencing libraries were positive for viral sequence, but the original and re-extracted cDNAs were negative (supplemental Figure 1B-E). Conversely, PCR detected multiple EBV transcripts in the re-extracted cDNA (data not shown). Thus, the sequences represent low-level contamination in the libraries and the samples were in fact devoid of known adeno- and polyomaviruses.
To detect novel viruses that may be contributing to AIDS-related lymphomas, we assembled all unalignable reads into contigs. Although many contigs were formed, all contigs aligned to known human, viral, or bacterial genomes (supplemental Table 3). To increase statistical power and sensitivity for detection, reads across all tumor samples were aggregated; however, this again resulted in contigs with negligible potential for encoding novel viral agents.
EBV gene expression in primary ARL samples is heterogeneous
To characterize viral persistence and potential maintenance of lymphomagenesis, we characterized the viral RNA and protein antigen landscape through transcript expression quantification, ISH, and IHC. EBV-mapping reads aligned to known transcribed regions of the viral genome, with the most prominently expressed transcripts being the RNA Pol III transcribed EBER1 and EBER2 transcripts along with type I latent associated EBNA1 transcripts (Figure 2A). Additional type II and III viral latency gene transcripts were expressed, including immunogenic LMP1 and LMP2 (Figure 2B-C) and an unexpected proportion of samples contained lytic transcripts, most notably being transcribed at the BHLF1 locus (Figure 2D; supplemental Figure 2A). Consistent with previous results,28 our findings suggested a diversity of latency-associated gene expression programs with a subset of cells exhibiting expression of genes corresponding to the lytic program.
It was unclear if the viral protein antigen landscape would reflect the heterogeneity of RNA gene expression, and how intratumoral heterogeneity would contribute to this phenomenon. To characterize viral antigens and intratumoral heterogeneity, we stained for the immunogenic LMP1, LMP2A, and EBNA2 proteins using IHC on ARL tissue microarrays. Many EBV+ cases were positive for ≥1 of these proteins (Table 2; Figure 2E; supplemental Figure 2B); however, there was significant diversity in the intensity of protein staining and the proportion of tumor cells that stained positive (supplemental Figure 2C). Thus, at both the RNA and protein level, EBV latent gene expression is heterogeneous in EBV-positive ARLs and does not necessarily conform to classical latency.
Given the heterogeneity of the viral transcriptional and protein landscape, we sought to confirm our findings across a larger cohort of ARL cases. We characterized viral latency type by LMP1 and EBNA2 immunohistochemistry in a validation cohort of 212 ARL cases (Table 2), 94 (44%) of which were positive for EBV as determined by EBER ISH. Although we expected a large proportion of DLBCLs to have a viral type II or III expression pattern, as suggested by previous studies,29,30 most cases were restricted to viral latency I programs, especially the majority of tumors with GC B-cell origin. Consistent with previous studies, most cases of BL had a latency I pattern.
It was unclear if the EBV transcriptional regulation patterns were unique to the immunosuppressed host environment of ARL. Thus, we performed a meta-analysis of the publicly available viral transcriptome data in HIV-negative sporadic BLs (sBLs),31 which are thought to express type I latency to limit immunogenicity. We found that EBV+ sBL samples presented with robust expression of EBNA1 and other viral genes compatible with type I latency, such as RPMS1 (Figure 3A; supplemental Figure 3A). Consistent with previous reports,28,32,33 we also found significant RNA expression of LMP2 in all EBV-positive sBL samples, and a significant subset also expressed LMP1, BHLF1, BHRF1, BMLF1, and BMRF2 (Figure 3B; supplemental Figure 3B).
We confirmed the active translation of the viral RNA diversity in a subset of tumors of immunocompetent individuals. IHC staining demonstrated presence of LMP1 after stringent positive and negative controls (Figure 3C). We next compared AIDS-related BL (AR BL) to HIV-negative sBL by IHC staining for LMP2A as a test for latency II or III viral gene expression. We assembled a cohort of 84 EBV+ ARL and 59 EBV+ sBL cases and found that 3 of 59 EBV+ sBL samples and 9 of 84 ARL cases showed robust LMP2A staining (Table 3). Taken together, these findings demonstrate a broad set of transcriptional latency programs in vivo in both the immunocompetent and the immunocompromised host.
EBV-positive ARL has a distinct host gene expression program
Because viral gene products influence host gene expression patterns that can drive oncogenesis and inhibit apoptosis, we characterized the host expression program associated with the presence of EBV. Total RNA-seq reads were mapped to the human genome, and expression values were estimated using cufflinks as RPKM.34 We confirmed that expression of host genes previously identified as enhanced by EBV were upregulated in EBV-positive cases. These genes included the EBNA3A response gene HSP70B (HSPA7)35 and CD30 (TNFRSF8), which acts as a biomarker for EBV titers in immunosuppressed lymphomas and infectious mononucleosis36 (Figure 4A; supplemental Figure 4A-C). Genome-wide, hundreds of host genes were quantitatively associated with EBV in ≥1 lymphoma histological subtype (DLBCL or BL) (Figure 4B; supplemental Figure 5A-B; supplemental Table 4; “Methods”). These findings were robust to leave-one-out validation, and host genes were associated with EBV in independent sets of samples (supplemental Figure 5C). Intriguingly, differentially regulated genes were significantly similar across the histological subtypes, supporting a consistent role for EBV across multiple ARL malignancies (Figures 4C; supplemental Figure 5D).
We next wanted to understand how the immunodeficient environment influenced the EBV-host regulatory network. We compared the EBV-associated host gene expression changes in ARL vs lymphomas of immunocompetent individuals by analyzing EBV-positive HIV-negative sBL cases.31 We first confirmed the existence of a robust EBV-signature in sBL samples (supplemental Figure 6A). Host genes differentially expressed in the presence of EBV in sBL had highly statistically significant overlap with the EBV-associated gene expression program in BL of HIV+ subjects but not DLBCL (supplemental Figure 6B-C). To determine whether the EBV-associated host gene expression program was enriched for lymphomagenesis pathways, we performed a gene set enrichment analysis on the molecular signatures database MSigDB3.0.37 EBV-positive samples had a statistically significant differential regulation of genes with altered expression in the context of plasma cell differentiation, maintenance of hematopoietic stem cells, and primary effusion lymphomas (Figure 4D; supplemental Figure 7). There also existed a significant correlation between EBV and host gene expression in lymphoblastoid cell lines (LCLs).23 When taken together, these 4 datasets capture the influence of immunosurveillance and malignant cell type morphology on the EBV-associated host gene expression signature. Accordingly, we found that sBL and AR BL cluster, as do the LCL and AIDS-related DLBCL (AR DLBCL) pathway signatures. Viral response pathways (eg, Hepatitis B virus response genes) were more prevalent in sBL than other samples, whereas EBV-associated enhancement of the plasma cell differentiation program was present only in DLBCL and LCLs.
Discussion
Using the sensitive technology of total RNA-seq and the PathSeq analysis pipeline, we evaluated the transcribed viral contribution to AIDS-related lymphoma. Although no novel viruses were detected, we discovered that EBV exploits the immuno-permissive environment of ARL to express viral antigens that are associated with an altered host gene expression program. Meta-analysis of sBLs in immunocompetent individuals revealed that EBV transcriptional regulation is broadly more promiscuous than previously appreciated; however, the immunocompetent immune system exerts a significant selection on viral antigen expression.
Although extensive literature has speculated on the contribution of viruses to increased cancer risk in the immunosuppressed,2,4,38 our observation that no additional pathogens were found in a cohort of 25 ARLs suggests that if any new pathogens exist in this disease, they are relatively rare or tightly latent, without significant RNA expression (“Methods”). Because we examined steady-state RNA sequences in established tumors, we cannot rule out the possibility that EBV or other viruses are transient enablers of cellular transformation, from which frank malignancy can expand and be subsequently maintained independent of initial viral contribution (ie, through so-called “hit-and-run” mechanisms).39 The potential for this mechanism is illustrated by EBV rescue of crippled GC centrocytes, allowing cells with transforming translocations to escape apoptosis,40-42 which can be followed by loss of viral episome after in vitro culture of EBV+ tumors.43-45 AIDS-associated malignancies may also be affected, in some cases, by increased activation of lymphocytes, resulting in greater frequency of DNA-damaged uninfected GC B cells.46,47 This indirect mechanism could be considered analogous to cancers associated with hepatitis C or Helicobacter pylori infections, which may trigger transformation through chronic inflammation, but pathogen sequences or proteins are not required for maintenance of tumorigenesis. Beyond hit-and-run pathogens, lack of immunosurveillance required for removal of sporadic pretumor dysplastic tissue may contribute to cancer susceptibility; however, attributing the increase of AIDS-defining cancers to decreased tumor surveillance begs the question of why there exists such an extensive risk for specific malignancies, whereas non–AIDS-defining cancers are increased only moderately in the HIV-positive or post-transplant immunosuppressed individual.17
Our findings point toward a dramatic increase of immunoselective pressure on viral antigen in the post-CART era, as evidenced by our cohort being predominantly type I latent EBV+ DLBCLs, whereas this histological subclass before CART was associated with a type III latency immunophenotype.30 Furthermore, distinct viral oncogenic mechanisms may be used in specific ARL histological subclasses. For instance, most EBV+ non-GC DLBCLs (63%) maintained type II or III latency, in contrast to only 24% of GCB DLBCLs, suggesting that more active EBV gene expression programs may be required to maintain the proliferative program of this tumor subclass.
Last, our results reveal a diversity of viral and host gene expression programs that may be therapeutic targets in only a subset of cases. Better integration of molecular diagnostics that include viral gene expression, as well as IHC, has the potential to identify patients that may benefit from in-development antilatent EBV protein therapies that interrupt the host-virus joint regulatory network.
This article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
This project was funded by the Starr Cancer Consortium and National Institutes of Health (NIH) National Cancer Institute grant 1RC2CA148317 (M.M. and E.C.). A.I.O. was supported by the Rebecca Ridley Kry Fellowship of the Damon Runyon Cancer Research Foundation. The AIDS Malignancy Clinical Trials Consortium (NIH National Cancer Institute grant U01CA121947) and the AIDS Cancer Specimen Resource (NIH National Cancer Institute grant 1UM1CA181255) contributed cases of ARL.
Authorship
Contribution: A.A., A.I.O., M.M., and E.C. designed research; A.A., A.I.O., and C.S.P. performed statistical and computational analyses; G.B. and E.B. analyzed primary samples and sequencing libraries for contamination and EBV; J.J. provided computational assistance; F.D. prepared samples for RNA-seq; L.L., W.T., G.D.F., and A.C. contributed ARL samples and confirmed the pathological classification; and A.A. and E.C. wrote the manuscript with input from A.I.O. and M.M.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
The current affiliation for A.A. is Gilead Sciences, Foster City, CA; for A.I.O. is Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL. The current affiliation for J.J. is Division of Oncology, Sanofi Aventis Group, Cambridge, MA. The current affiliation for G.D.F. is School of Biological and Chemical Sciences, Queen Mary University of London, London, UK. The current affiliation for A.C. is Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY.
Correspondence: Ethel Cesarman, Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, 1300 York Ave, New York, NY 10065; e-mail: ecesarm@med.cornell.edu.
References
Author notes
A.A. and A.I.O. contributed equally to this work.