Key Points
EBV genome alterations contribute to hematological malignancies in both disease-specific and general manner.
Frequent deletions suggest that BamHI A rightward transcript microRNA clusters and EBNA3B likely function as viral tumor suppressors.
Visual Abstract
Epstein-Barr virus (EBV) infects >90% of humans and is associated with both hematological and epithelial malignancies. Here, we analyzed 990 EBV genomes (319 newly sequenced and 671 from public databases) from patients with various diseases to comprehensively characterize genomic variations, including single nucleotide variations (SNVs) and structural variations (SVs). Although most SNVs were a result of conservative evolution and reflected the geographical origins of the viral genomes, we identified several convergent SNV hot spots within the central homology domain of EBNA3B, the transactivation domain of EBNA2, and the second transmembrane domain of LMP1. These convergent SNVs seem to fine-tune viral protein functionality and immunogenicity. SVs, particularly large deletions, were frequently observed in chronic active EBV disease (28%), EBV-positive diffuse large B-cell lymphoma (48%), extranodal natural killer/T-cell lymphoma (41%), and Burkitt lymphoma (25%), but were less common in infectious mononucleosis (11%), posttransplant lymphoproliferative disorder (7%), and epithelial malignancies (5%). In hematological malignancies, deletions often targeted viral microRNA clusters, potentially promoting viral reactivation and lymphomagenesis. Nondeletion SVs, such as inversions, were also prevalent, with several inversions disrupting the C promoter to suppress latent gene expression, thereby maintaining viral dormancy. Furthermore, recurrent EBNA3B deletions suggested that this viral transcription factor functions as a tumor suppressor. EBNA3B knockout experiments in vitro revealed downregulation of human tumor suppressors, including PTEN and RB1, which could explain the enhanced lymphomagenesis observed in EBNA3B-deficient lymphoblastoid cell line xenografts. Our findings highlight both disease-specific and general contributions of EBV genomic alterations to human cancers, particularly in hematological malignancies.
Introduction
Epstein-Barr virus (EBV), a member of the herpesvirus family, is an oncogenic virus associated with a broad range of malignancies originating from both hematopoietic and epithelial tissues.1 EBV can establish lifelong latency within human hosts, particularly in B lymphocytes, and undergoes periodic reactivation to produce infectious virions, facilitating viral spread to other susceptible cells or individuals. Owing to this highly efficient strategy, EBV infects >95% of the global population and is estimated to contribute to ∼1% of all human cancers worldwide, underscoring its significant role in the global cancer burden.2,3
The double-stranded DNA genome of EBV encodes a repertoire of genes that orchestrate both the virus’s life cycle and its interactions with host cells. The expression of latent genes is stably inherited in infected cells throughout the host’s life and is crucial for the immortalization and proliferation of those cells. However, recent studies have highlighted that viral genes associated with the lytic cycle may also contribute to malignant transformation, indicating a more intricate mechanism of gene regulation during oncogenesis.4 In addition, the viral episome itself can exert trans-acting enhancer activity, contributing to human oncogenesis.5
Although our understanding of EBV pathogenesis has advanced significantly, the impact of variations within the viral genome remains an area of active investigation. The substantial genetic variation observed among viral isolates adds further complexity to EBV pathogenesis. Polymorphisms within the EBV genome, which are often strain specific, may play a role in the geographical distribution of EBV-associated diseases worldwide.6 This genetic diversity may help to explain the tissue tropism of certain EBV-associated malignancies, such as endemic nasopharyngeal carcinoma (NPC) in Southern China and Burkitt lymphoma (BL) in Africa.7 Recently, strains of type 1 EBV, characterized by 3 single nucleotide variations (SNVs) in the BALF2 gene, have been found to be associated with endemic NPC in China and sporadic NPC in Japan.8,9 However, the functional consequences of most SNVs remain largely unexplored.
In addition to inherited SNVs, large structural variations (SVs) in the EBV genome play a crucial role in lymphomagenesis. Long deletions can affect viral microRNAs (miRNAs) or genes essential for virion production, potentially leading to viral reactivation and the subsequent expression of several viral oncogenes. This aberrant gene expression can trigger enhanced cell survival mechanisms and promotes genomic instability in host cells, thereby facilitating the initiation and progression of malignancies.10 The complex interplay between SNVs, SVs, and host factors underscores the multifaceted nature of EBV-associated oncogenesis and reveals the necessity for comprehensive genomic analyses to unpack these intricate relationships.
In this study, we conducted a thorough analysis of EBV genomes from a diverse array of EBV-associated diseases to further elucidate the role of EBV in oncogenesis.
Materials and methods
Study subjects
We analyzed a total of 990 EBV genomes from healthy volunteers, patients with various EBV-associated diseases, and cell lines (supplemental Tables 1 and 2, available on the Blood website). We sequenced 319 EBV genomes, of which 187 were included in our previous reports.10 In addition, we collected publicly available targeted sequencing data of EBV from the National Center for Biotechnology Information Sequence Read Archive (www.ncbi.nlm.nih.gov/sra) comprising 671 samples.
Target capture–based whole-genome sequencing
We extracted genomic DNA from peripheral blood mononuclear cells (PBMCs) or formalin-fixed, paraffin-embedded (FFPE) tissues using a QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) or GeneRead DNA FFPE Kit (Qiagen), respectively, in accordance with the manufacturer’s instructions. A target capture–based method was used to selectively sequence the EBV genome.10 Briefly, a custom capture bait targeting 7 major strains of EBV (including the Akata strain, which is prevalent in Japan) was designed using SureDesign (Agilent, Santa Clara, CA). Prep libraries were constructed using SureSelect XT Reagents (for PBMC DNA; Agilent) or KAPA HyperPrep Kit for Illumina (for FFPE DNA; Nippon Genetics, Tokyo, Japan) following the manufacturers’ instructions. Target capture and subsequent library preparation were performed using SureSelect XT Reagents. Libraries were quantified with a Qubit fluorometer (Life Technologies, Carlsbad, CA) and an Agilent 2100 TapeStation (Agilent). Sequencing was conducted on a HiSeq2500 next-generation sequencing platform (Illumina, San Diego, CA) with a 2 × 150 nt read option. De-multiplexing and base calling were executed on the BaseSpace platform (Illumina).
EBV recombination and LCL establishment
We generated EBNA3B-deficient and revertant EBV-Bac from wild-type EBV-Bac (B95-8 strain; originally provided by W. Hammerschmidt) as described previously.11 The oligonucleotides used for this series of recombination are as follows: EBNA3B-NeoSt-pF; 5′-GCAGGCTCCAGTGATCCAACTAGTCCACGCTGTCTATGATTCCATGTTGGTAAGAGGCACGGCCTGGTGATGATGGCGGGATC-3′, EBNA3B-NeoSt-pR; 5′-CTGTACGTTGTTGCATGCCGCGTGCCCTCATCACATCCTAGCGTAGCAGTCACAAGGACCTCAGAAGAACTCGTCAAGAAGG-3′, EBNA3B-stop-pF; 5′-GGCCCCTAGGCTGATAATTCCTTTAGCAAACCTGAACATCGAAG-3′, and EBNA3B-stop-pR; 5′-CTTCGATGTTCAGGTTTGCTAAAGGAATTATCAGCCTAGGGGCC-3′. Targeted recombination was confirmed by Sanger sequencing. Using each recombinant EBV, we established lymphoblastoid cell lines (LCLs) from PBMCs of a healthy Japanese volunteer.
Other detailed methods are described in the supplemental Methods.
The study was approved by the ethics committees and institutional review boards of Nagoya City University (institutional review board no. 70-23-0006).
Results
Summary of the EBV genomes analyzed in this study
This study conducted a comprehensive analysis of 990 EBV genomes from diverse hematological and epithelial malignancies, incorporating a patient cohort representative of various disease types and geographical origins (supplemental Tables 1 and 2). To ensure the reliability and comparability of viral genome sequencing data, we analyzed genomes from 8 distinct cell lines, each of which was independently sequenced by 2 or more institutions (Figure 1A; supplemental Figure 1). The results demonstrated a 99.99% concordance in SNV calls across institutions, confirming the robustness and comparability of the data.
Genomic landscape of EBV strains and SNVs. (A) Concordance of viral genome sequencing across institutions. Eight cell lines were sequenced at ≥2 institutions (Jijoye and Namalwa were sequenced at 3 institutions, whereas the remaining 6 were sequenced at 2). Blue bars represent discordant SNVs, whereas the red box indicates a deletion. The numbers denote the total genomic differences observed between institutions. (B) Hierarchical clustering of 990 EBV genomes based on nucleotide variations. Alongside the hierarchical clustering, 2 PCs, countries of origin, associated diseases, assigned cluster numbers, EBV types (1/2), BALF2 mutations, and the presence of SVs are displayed. (C) Frequency of type 2 EBV genomes across different disease categories. B-cell–associated diseases exhibit a higher frequency of type 2 EBV compared with T/NK and epithelial cell (epi) diseases. (D) LD mapping across the EBV genome, with genome coordinates referenced to NC_007605.1. The W repeat region is masked in gray. (E) LD trend as a function of genomic distance. (F) Geometric distribution of single nucleotide variant frequency, the fraction of synonymous variants, and nucleotide alteration patterns. ∗P < .05. AITL, angioimmunoblastic T-cell lymphoma; GC, gastric carcinoma; HLH, hemophagocytic lymphohistiocytosis; LD, linkage disequilibrium; PC, principal component; T/NK-cell, T cell/natural killer cell.
Genomic landscape of EBV strains and SNVs. (A) Concordance of viral genome sequencing across institutions. Eight cell lines were sequenced at ≥2 institutions (Jijoye and Namalwa were sequenced at 3 institutions, whereas the remaining 6 were sequenced at 2). Blue bars represent discordant SNVs, whereas the red box indicates a deletion. The numbers denote the total genomic differences observed between institutions. (B) Hierarchical clustering of 990 EBV genomes based on nucleotide variations. Alongside the hierarchical clustering, 2 PCs, countries of origin, associated diseases, assigned cluster numbers, EBV types (1/2), BALF2 mutations, and the presence of SVs are displayed. (C) Frequency of type 2 EBV genomes across different disease categories. B-cell–associated diseases exhibit a higher frequency of type 2 EBV compared with T/NK and epithelial cell (epi) diseases. (D) LD mapping across the EBV genome, with genome coordinates referenced to NC_007605.1. The W repeat region is masked in gray. (E) LD trend as a function of genomic distance. (F) Geometric distribution of single nucleotide variant frequency, the fraction of synonymous variants, and nucleotide alteration patterns. ∗P < .05. AITL, angioimmunoblastic T-cell lymphoma; GC, gastric carcinoma; HLH, hemophagocytic lymphohistiocytosis; LD, linkage disequilibrium; PC, principal component; T/NK-cell, T cell/natural killer cell.
SNVs identified in the EBV genome
A detailed examination of the 990 EBV genomes identified 22 319 SNVs, with individual genomes harboring between 46 and 2011 variants. Hierarchical clustering of the viral genomes using these SNVs revealed 6 distinct clusters, primarily driven by geographical origin (clusters 1-4 and 6) or the type 2 EBV (cluster 5; Figure 1B). Clusters 1, 2, 4, and 6 predominantly comprised genomes from Asia, whereas cluster 3 included genomes mainly from Africa, Europe, and the Americas. Notably, clusters 2 and 4 contained genomes from patients with endemic NPC and geographically matched healthy volunteers. These clusters featured characteristic NPC-related SNVs within the BALF2 gene (162215A→C, 162476T→C, and 163364C→T variants).8
Although type 2 EBV genomes were observed across all geographic regions analyzed, the frequency of their occurrence varied significantly depending on the disease. A clear association emerged between type 2 EBV and B-cell diseases such as BL, diffuse large B-cell lymphoma (DLBCL), posttransplant lymphoproliferative disorder (PTLD), and Hodgkin lymphoma (HL; Figure 1C). In contrast, type 2 EBV was less frequently detected in T- and natural killer (NK)-cell diseases and NPC. Notably, type 2 EBV genomes were absent in gastric carcinoma, extranodal NK/T-cell lymphoma (ENKL), infectious mononucleosis (IM), and hemophagocytic lymphohistiocytosis. This distribution suggests that specific EBV genotypes exhibit preference for different disease contexts.
The EBV genome displayed a moderate degree of linkage disequilibrium across its length, indicative of occasional recombination events (Figure 1D-E). This pattern suggests that genetic exchange between viral strains occurs intermittently and contributes to the genetic diversity observed within the EBV population. Interestingly, recombination appeared subdued in the 78 001 to 94 001 bp region, which includes EBNA3A/B/C, BZLF1, and BRLF1, suggesting potential haplotype compatibility among the EBNA3 proteins, which are known to interact in similar regulatory regions.12
Among the identified nucleotide alterations, most (97.5%; n = 21 759) were SNVs, whereas the remainder were insertions and deletions (2.5%; n = 560). Approximately 45% of the SNVs consisted of cytosine-to-thymine transitions (Figure 1F). Although the mutation frequency was generally uniform across the genome, 2 regions had notably higher mutation rates. The first was a region encoding the EBNA3 proteins (80-89 kb), partly reflecting differences between type 1 and type 2 EBV genomes. The second, near the terminal repeat (167-170 kb), included the LMP1 gene, which encodes a major EBV cell surface antigen. Unlike the EBNA3 region, mutations in this region displayed less pronounced bias toward cytosine-to-thymine transitions. Mutations were not restricted to coding regions of LMP1, as they extended into introns, the 5′ untranslated region, and upstream promoter elements, suggesting that factors beyond antigenicity may be driving the elevated mutation rates in this region.
Approximately 47% of the identified coding variants were silent, indicating that many of these variants likely arose through conservative evolutionary processes. Interestingly, the proportion of silent variants decreased in association with declining minor allele frequency, suggesting selective pressure may be eliminating nonsynonymous variants in the evolving viral population.
Mutations recurrently acquired in distinct evolutionary branches (convergent mutations)
Our analysis identified a subset of mutations that seemed to have been acquired independently across multiple branches of the EBV evolutionary tree. Although common SNPs are typically associated with more rare variants in a purely linear evolutionary model, certain SNVs demonstrated discordance with nearby common SNPs. Through a detailed examination of linkage patterns, we identified 156 SNVs exhibiting this discordant behavior within cluster 1, suggesting that these variants arose independently multiple times in the course of EBV evolution (Figure 2A; supplemental Figure 2; supplemental Table 3).
Convergent mutations and their hot spots. (A) Distribution of convergent mutations in cluster 1 (comprising samples primarily from Japan). Convergent SNVs are illustrated in red, whereas other SNVs are in gray. (B) Classification of convergent and other SNVs. Synonymous SNVs were less frequent among convergent SNVs compared with other SNVs. (C) Distribution of convergent SNVs within EBNA3B. (D) Amino acid substitutions resulting from convergent SNVs in the core central homology domain of EBNA3B. Identical amino acids among the 3 EBNA3 homologs are illustrated in red and green. Blue and yellow lines represent predicted antigenic peptides and critical residues for RBP/J binding, respectively. (E) Mutual exclusivity of convergent SNVs in the EBNA3B core central homology domain. Ten distinct SNVs (each affecting 2-35 samples) were distributed across 91 samples without any overlap. (F) Monte Carlo simulation of mutual exclusivity. Random distribution of 10 distinct SNVs across 366 samples (in cluster 1) resulted in the expectation that some samples might harbor 2 or more SNVs. Under 10 million simulation trials, the likelihood of no overlap was calculated as 4.92 × 10−5. (G) Convergent SNVs affecting LMP1. A relatively large number of convergent SNVs were found within the transmembrane (TM) 2 domain. (H) Amino acid substitutions from convergent SNVs in the TM 2 domain. Hydrophobic amino acid residues are indicated in red. (I) Convergent SNVs upstream of LMP1. Among the 3 known promoters, pED-L1 was most frequently affected by convergent SNVs. (J) Convergent SNVs in EBNA3A. The upper histogram illustrates convergent SNVs found in cluster 1 (mainly Japan), whereas the lower histogram illustrates those found in cluster 2 (mainly China). GC, gastric carcinoma.
Convergent mutations and their hot spots. (A) Distribution of convergent mutations in cluster 1 (comprising samples primarily from Japan). Convergent SNVs are illustrated in red, whereas other SNVs are in gray. (B) Classification of convergent and other SNVs. Synonymous SNVs were less frequent among convergent SNVs compared with other SNVs. (C) Distribution of convergent SNVs within EBNA3B. (D) Amino acid substitutions resulting from convergent SNVs in the core central homology domain of EBNA3B. Identical amino acids among the 3 EBNA3 homologs are illustrated in red and green. Blue and yellow lines represent predicted antigenic peptides and critical residues for RBP/J binding, respectively. (E) Mutual exclusivity of convergent SNVs in the EBNA3B core central homology domain. Ten distinct SNVs (each affecting 2-35 samples) were distributed across 91 samples without any overlap. (F) Monte Carlo simulation of mutual exclusivity. Random distribution of 10 distinct SNVs across 366 samples (in cluster 1) resulted in the expectation that some samples might harbor 2 or more SNVs. Under 10 million simulation trials, the likelihood of no overlap was calculated as 4.92 × 10−5. (G) Convergent SNVs affecting LMP1. A relatively large number of convergent SNVs were found within the transmembrane (TM) 2 domain. (H) Amino acid substitutions from convergent SNVs in the TM 2 domain. Hydrophobic amino acid residues are indicated in red. (I) Convergent SNVs upstream of LMP1. Among the 3 known promoters, pED-L1 was most frequently affected by convergent SNVs. (J) Convergent SNVs in EBNA3A. The upper histogram illustrates convergent SNVs found in cluster 1 (mainly Japan), whereas the lower histogram illustrates those found in cluster 2 (mainly China). GC, gastric carcinoma.
The 156 convergent SNVs contained fewer silent SNVs compared with those observed across the entire data set, implying that these convergences may have occurred under selection pressure (Figure 2B). Moreover, several convergent SNVs were clustered into distinct hot spots within the viral genome (Figure 2C-J). Of particular note, a significant hot spot was identified within a small region of EBNA3B, consistently observed across various disease contexts (Figure 2C). This cluster of mutations was localized to amino acid residues 195 to 203 within the core homology domain shared by EBNA3A, EBNA3B, and EBNA3C. This domain is essential for binding the host RBP/Jk transcription factor, which functions to suppress EBNA2 transcriptional activity.13 In addition, this domain is highly antigenic in the context of major histocompatibility complex (MHC) binding in the Japanese population (supplemental Table 4). Interestingly, these hot spot mutations did not affect highly conserved amino acids shared among the 3 EBNA3 proteins, including 4 critical residues for RBP/J binding, suggesting a nuanced immune selective pressure acting on this region (Figure 2D). Indeed, both the wild-type (WT) EBNA3B peptide (amino acids 198-206) and its variant carrying a convergent mutation were found to bind HLA-B, as demonstrated by an MHC stabilization assay using the T2 cell line transfected with HLA-B35:01 or HLA-B51:01 (supplemental Figure 3).
A total of 10 convergent mutations were identified within this domain, which were found to be mutually exclusively acquired in 91 samples (P = 4.9 × 10–5; Figure 2E-F).
LMP1 also exhibited 2 prominent hot spots of convergent mutations: one spanning amino acid residues 52 to 72 and the other encompassing residues 321 to 350 (Figure 2G). The former hot spot is located within the transmembrane 2 domain, where mutations typically involved substitutions between hydrophobic residues, possibly indicative of immune-driven selection while maintaining the hydrophobicity necessary for proper protein function (Figure 2H). In contrast, the biological significance of mutations identified within the second hot spot, located in the CTAR 3 domain, remains unclear. Furthermore, regions upstream of LMP1’s coding sequence, which include 2 LMP1 promoters and are known to be hypervariable, were also found to be a target of convergent mutations (Figure 2I).14 Notably, convergent mutations were frequently localized near the transcription start site of the pED-L1 promoter. In the case of EBNA3A, the specific locations of convergent mutation hot spots varied across different EBV clusters (Figure 2J), suggesting the influence of distinct immune pressures driven by variable HLA haplotype distributions across populations.
SVs within the EBV genome
The analysis of 990 EBV genomes revealed various SVs, including 163 intragenic deletions (1-3 deletions per sample, affecting 143 samples; Figure 3; supplemental Table 5), 18 inversions (affecting 12 samples), and 4 instances where integration into the human genome was observed (affecting 4 samples; Figure 4).
Intragenic deletions shape EBV genomes across disease landscapes. (A) Comprehensive overview of intragenic deletions (>50 bases) identified across diverse diseases. Each gray line represents the EBV genome of an individual patient. Filled bars denote complete deletions (with no remaining alleles), whereas open bars represent partial deletions (retaining some alleles). The affected viral components are indicated to the right of each EBV genome. A histogram summarizes the frequency of deletions in specific genomic regions, with blue and gray bars representing complete and partial deletions, respectively. The color codes for diseases (consistent with those used in Figure 1A) and EBV genome components are also indicated. (B) Distribution of deletion lengths across various disease categories, visualized with violin plots. Box plots within the violins illustrate interquartile ranges (the edges represent the 25th and 75th percentiles, whereas the inner bar indicates the median). ∗P < .01 (.00014 for CAEBV+DLBCL+ENKL+BL+HL vs IM+PTLD; .00197 for CAEBV+DLBCL+ENKL+BL+HL vs epithelial cell malignancies; .00181 for CAEBV+DLBCL+ENKL+BL+HL vs healthy donors). Other, other hematological diseases; epithelial, epithelial cell malignancies; healthy, healthy donors. (C) Disease-specific frequencies of deletions illustrating both overall deletions and those affecting specific viral components. (D-I) Zoomed-in views of specific EBV genomic regions highlighting deletion patterns. Numbers in these panels correspond to the individual miRNAs within the BARTmiR clusters. Color coding follows the same scheme as in panel A for consistency and clarity. BARTmiR, BART miRNAs; essential, essential genes for virion production; OriP, replication origin involved in latent infection; oriLyt, replication origin involved in lytic infection.
Intragenic deletions shape EBV genomes across disease landscapes. (A) Comprehensive overview of intragenic deletions (>50 bases) identified across diverse diseases. Each gray line represents the EBV genome of an individual patient. Filled bars denote complete deletions (with no remaining alleles), whereas open bars represent partial deletions (retaining some alleles). The affected viral components are indicated to the right of each EBV genome. A histogram summarizes the frequency of deletions in specific genomic regions, with blue and gray bars representing complete and partial deletions, respectively. The color codes for diseases (consistent with those used in Figure 1A) and EBV genome components are also indicated. (B) Distribution of deletion lengths across various disease categories, visualized with violin plots. Box plots within the violins illustrate interquartile ranges (the edges represent the 25th and 75th percentiles, whereas the inner bar indicates the median). ∗P < .01 (.00014 for CAEBV+DLBCL+ENKL+BL+HL vs IM+PTLD; .00197 for CAEBV+DLBCL+ENKL+BL+HL vs epithelial cell malignancies; .00181 for CAEBV+DLBCL+ENKL+BL+HL vs healthy donors). Other, other hematological diseases; epithelial, epithelial cell malignancies; healthy, healthy donors. (C) Disease-specific frequencies of deletions illustrating both overall deletions and those affecting specific viral components. (D-I) Zoomed-in views of specific EBV genomic regions highlighting deletion patterns. Numbers in these panels correspond to the individual miRNAs within the BARTmiR clusters. Color coding follows the same scheme as in panel A for consistency and clarity. BARTmiR, BART miRNAs; essential, essential genes for virion production; OriP, replication origin involved in latent infection; oriLyt, replication origin involved in lytic infection.
SVs in EBV genomes. (A-C) Representative examples of intragenic deletions. Integrative genomics viewer images display intragenic deletions within EBV genomes from 3 CAEBV patients. Histograms depict read coverage, with individual gray lines representing sequence reads. (A) In patient UPN257, a near-complete deletion is observed, with minimal evidence of retained alleles. (B) In patient UPN404, partial retention of reads within the deleted region suggests the presence of undeleted alleles. Mean read coverage within the deletion (150×) amounts to 3.5% of the genome-wide average (4264×). (C) Patient UPN417 exhibits a different pattern, with mean read coverage within the deletion reaching 16.8% of the genomic average. (D) Partial deletions affecting EBNA1. Five partial deletions targeting EBNA1 were identified, each involving at least half of the EBV genome. All deletions retained the essential replication origin, oriP. (E) Schematic of EBNA1 interaction between viral and host genomes. EBNA1 binds to both the oriP region within the viral genome and a specific site in the host genome. This binding is crucial for viral genome stability, ensuring that viral replication occurs alongside host DNA during cell division. Consequently, every EBV genome copy must possess oriP, and at least 1 functional EBNA1 copy per cell is required for viral persistence. (F) Coexistence of complete and partial genomes. In patients UPN920, UPN1213, and eBL-Tumor-0007, ∼50% of the remaining allele fraction suggests a balanced coexistence of complete and partial EBV genomes within each cell. (G) Predominance of partial genomes. In contrast, patients UPN1301 and UPN1306 exhibited a lower remaining allele fraction (∼10%), indicating a predominance of partial genomes over complete ones. (H) Genomic inversions. Horizontal lines represent individual EBV genomes, with arcs indicating inversions. In total, 12 EBV genomes harbored inversions. The locations of the C promoter and EBNA genes are also illustrated. In the top 6 genomes (marked with asterisks), inversions seem to disrupt the connection between the C promoter and the EBNA genes. (I) Integration into the human genome. Four EBV genomes, all derived from Burkitt lymphoma patients, were found integrated into the human genome. The GK_BL29 EBV genome integrated into intron 3 of the ANKMY1 gene, potentially forming a fusion between RPMS1 within the viral genome and ANKMY1 in the host genome. The GK_Farage EBV genome integrated within the immunoglobulin heavy chain (IGH) locus. The Namalwa EBV genome integrated into intron 4 of the PPIEL gene. Instances where repetitive sequences obscured the precise location of the SV are marked with an asterisk. Last, the eBL-Tumor-0031 EBV genome integrated into chromosome 12q21.31, a region devoid of annotated genes within a 100-kb radius. GC, gastric carcinoma.
SVs in EBV genomes. (A-C) Representative examples of intragenic deletions. Integrative genomics viewer images display intragenic deletions within EBV genomes from 3 CAEBV patients. Histograms depict read coverage, with individual gray lines representing sequence reads. (A) In patient UPN257, a near-complete deletion is observed, with minimal evidence of retained alleles. (B) In patient UPN404, partial retention of reads within the deleted region suggests the presence of undeleted alleles. Mean read coverage within the deletion (150×) amounts to 3.5% of the genome-wide average (4264×). (C) Patient UPN417 exhibits a different pattern, with mean read coverage within the deletion reaching 16.8% of the genomic average. (D) Partial deletions affecting EBNA1. Five partial deletions targeting EBNA1 were identified, each involving at least half of the EBV genome. All deletions retained the essential replication origin, oriP. (E) Schematic of EBNA1 interaction between viral and host genomes. EBNA1 binds to both the oriP region within the viral genome and a specific site in the host genome. This binding is crucial for viral genome stability, ensuring that viral replication occurs alongside host DNA during cell division. Consequently, every EBV genome copy must possess oriP, and at least 1 functional EBNA1 copy per cell is required for viral persistence. (F) Coexistence of complete and partial genomes. In patients UPN920, UPN1213, and eBL-Tumor-0007, ∼50% of the remaining allele fraction suggests a balanced coexistence of complete and partial EBV genomes within each cell. (G) Predominance of partial genomes. In contrast, patients UPN1301 and UPN1306 exhibited a lower remaining allele fraction (∼10%), indicating a predominance of partial genomes over complete ones. (H) Genomic inversions. Horizontal lines represent individual EBV genomes, with arcs indicating inversions. In total, 12 EBV genomes harbored inversions. The locations of the C promoter and EBNA genes are also illustrated. In the top 6 genomes (marked with asterisks), inversions seem to disrupt the connection between the C promoter and the EBNA genes. (I) Integration into the human genome. Four EBV genomes, all derived from Burkitt lymphoma patients, were found integrated into the human genome. The GK_BL29 EBV genome integrated into intron 3 of the ANKMY1 gene, potentially forming a fusion between RPMS1 within the viral genome and ANKMY1 in the host genome. The GK_Farage EBV genome integrated within the immunoglobulin heavy chain (IGH) locus. The Namalwa EBV genome integrated into intron 4 of the PPIEL gene. Instances where repetitive sequences obscured the precise location of the SV are marked with an asterisk. Last, the eBL-Tumor-0031 EBV genome integrated into chromosome 12q21.31, a region devoid of annotated genes within a 100-kb radius. GC, gastric carcinoma.
The prevalence of deletions revealed substantial variation across different disease contexts. EBV genomes from patients with chronic active Epstein-Barr virus disease (CAEBV) and hematological malignancies exhibited relatively high frequencies of deletions (20%-46%), whereas those from individuals with IM, PTLD, or epithelial cell malignancies (gastric carcinoma and NPC), or healthy volunteers displayed lower frequencies (4%-11%; Figure 3A).
During the SV analyses, we observed several samples containing mixtures of viral genomes with and without deletions, a phenomenon we termed partial deletions. A total of 41 partial deletions were identified, with 2.6% to 66% of undeleted alleles retained (Figure 4A-C). The remaining 122 deletions were classified as complete deletions. Partial deletions were more common in CAEBV, DLBCL, and HL but were absent in ENKL.
The distribution of deletion lengths also varied among diseases, with long deletions (>10 000 bp) being characteristic of hematological malignancies (Figure 3B).
Disease-specific patterns were also observed in the types of viral components affected by deletions (Figure 3C; supplemental Table 6). In CAEBV and DLBCL, deletions frequently targeted the BamHI A rightward transcript (BART) miRNA clusters, which are critical for maintaining viral latency through miRNA expression, and genes necessary for virion production. In ENKL, deletions primarily affected BART miRNA clusters while sparing essential genes. Conversely, BL displayed a lower frequency of deletions affecting both BART miRNA clusters and essential genes, though EBNA2 and LMP2A/B were frequently affected. In HL, deletions predominantly targeted essential genes. In contrast, intragenic deletions were less frequent in IM, PTLD, epithelial cell malignancies, and healthy donors, and rarely affected the aforementioned viral components.
Across all disease contexts, BART miRNA cluster 1 emerged as the most frequently affected viral component (Figure 3D). Notably, ebv-mir-BART6, a miRNA from this cluster known to downregulate DICER1 and suppress viral lytic reactivation, was deleted in 34 samples. BART miRNA cluster 2 was also frequently targeted (Figure 3E), with ebv-mir-BART18 (deleted in 27 samples) and ebv-mir-BART20 (deleted in 28 samples) identified as key players in the suppression of lytic reactivation.15
Monte Carlo simulations, which randomly reassigned deletions of similar lengths within the genome, revealed EBNA3B as a novel target of frequent deletion (Figure 3F). Previous studies have linked the deletion of EBNA3B to neoplastic development.16 In addition, several other regions, including the C promoter, exons 1 to 2 of EBNALP, EBNA2, and BALF genes, were frequently targeted by deletions (Figure 3G-I).
Conversely, certain viral components were rarely affected by deletions (supplemental Table 6). BPLF1, BNLF2a, and BNLF2b had the lowest deletion frequencies relative to their lengths. BPLF1, a tegument protein with deubiquitinase activity, and BNLF2a, an inhibitor of host cell peptide loading onto MHC molecules, displayed remarkable resistance to deletions. Although the function of BNLF2b remains hypothetical, its infrequent deletion suggests a potentially important role. Remarkably, EBNA1, 1 of the 2 essential components responsible for maintaining viral genome retention within host cells, was not affected by complete deletions. The replication origin was largely unaffected, except in 1 instance (GK_BL60), which carried a complex SV involving a deletion, 2 inversions, and 2 unresolved termini in its viral genome. Furthermore, several other critical components, including oriLyt (a replication origin during the lytic cycle), BHRF1 (encoding the viral BCL-2 homolog), and BZLF1/BRLF1 (transcription factors activating the lytic cycle), also escaped complete deletion.
Five viral genomes exhibited partial deletions exceeding 50% of the total viral length (Figure 4D). Intriguingly, all these deletions encompassed EBNA1, an essential gene for viral genome maintenance within host cells. Because partial viral genomes lacking EBNA1 depend on helper genomes in the same cell to supply EBNA1 protein, these findings suggest the coexistence of both partial and complete viral genomes within a single cell (Figure 4E-G).
In addition to deletions, 18 inversions were identified across 12 viral genomes (Figure 4H), ranging in size from 52 to 88 345 bp. Notably, half of these inversions disrupted the connection between the C promoter and EBNA genes. Translocations between the EBV genome and the human genome were rare, identified in just 4 instances in BL (Figure 4I).
These findings highlight that intragenic deletions, particularly extensive ones, are a distinctive feature of hematological malignancies and suggest that such deletions may play diverse roles in different types of hematological neoplasms.
EBNA3B as a tumor-suppressor gene
The statistically significant accumulation of deletions targeting EBNA3B strongly suggests that this gene may function as a tumor suppressor, together with several truncating mutations (Figure 5A-B). To further investigate this hypothesis, we established LCLs using WT, EBNA3B-deficient (dEBNA3B), or revertant (rev) EBV strains, in addition to analyzing publicly available chromatin immunoprecipitation sequencing (ChIP-seq) data sets.12,16 t-distributed stochastic neighbor embedding (t-SNE) analysis revealed distinct global gene expression patterns between WT and dEBNA3B LCLs, whereas WT and rev LCLs exhibited similar expression profiles (Figure 5C). A total of 3776 genes were found to be differentially expressed, with a threshold of q < 0.1. Two chemokines, CXCL9 and CXCL10, were downregulated in dEBNA3B LCLs, as reported previously (supplemental Figure 4).16 Among the significantly upregulated genes was TERT, a key regulator of telomere length maintenance.
Transcriptional analysis of EBNA3B. (A) Summary of deletions and amino acid changes identified in EBNA3B. Color coding for diseases is the same as in Figure 3A. (B) Summary of nonsense and frameshift mutations identified in EBNA3B. The c.2653dupG and c.2804_2807delCGAA mutations were identified in 2 samples. (C) t-SNE plot illustrating gene expression profiles of LCLs established with WT EBV (gray), EBNA3B-deficient EBV (dEBNA3B, blue), and revertant EBV (orange). (D) Volcano plot comparing gene expression between WT- and dEBNA3B-LCLs. Red circles represent genes with at least one EBNA3B ChIP-seq peak within 100 kb of their transcription start sites. (E) Frequency of EBNA3B ChIP-seq peaks in upregulated and downregulated genes. (F) Gene set enrichment analysis comparing WT- and dEBNA3B-LCLs, highlighting the enrichment of a gene set frequently deleted in glioblastoma (TCGA_GLIOBLASTOMA_COPY_NUMBER_DN). (G) Heat map and dot plots illustrating differential gene expression among the 3 LCLs. Gene symbols highlighted in yellow denote established tumor suppressor genes. (H) Enriched gene set (SOTIRIOU_BREAST_CANCER_GRADE_1_VS_3_UP) identified in WT- and dEBNA3B-LCLs. (I) Gene set enrichment analysis (PTEN_DN.V2_UP) suggesting possible downregulation of the PTEN pathway in dEBNA3B-LCLs. (J) Gene set enrichment analysis (RB_P107_DN.V1_UP) suggesting downregulation of the RB pathway in dEBNA3B-LCLs. FDR, false discovery rate; FWER, family-wise error rate; NES, normalized enrichment score.
Transcriptional analysis of EBNA3B. (A) Summary of deletions and amino acid changes identified in EBNA3B. Color coding for diseases is the same as in Figure 3A. (B) Summary of nonsense and frameshift mutations identified in EBNA3B. The c.2653dupG and c.2804_2807delCGAA mutations were identified in 2 samples. (C) t-SNE plot illustrating gene expression profiles of LCLs established with WT EBV (gray), EBNA3B-deficient EBV (dEBNA3B, blue), and revertant EBV (orange). (D) Volcano plot comparing gene expression between WT- and dEBNA3B-LCLs. Red circles represent genes with at least one EBNA3B ChIP-seq peak within 100 kb of their transcription start sites. (E) Frequency of EBNA3B ChIP-seq peaks in upregulated and downregulated genes. (F) Gene set enrichment analysis comparing WT- and dEBNA3B-LCLs, highlighting the enrichment of a gene set frequently deleted in glioblastoma (TCGA_GLIOBLASTOMA_COPY_NUMBER_DN). (G) Heat map and dot plots illustrating differential gene expression among the 3 LCLs. Gene symbols highlighted in yellow denote established tumor suppressor genes. (H) Enriched gene set (SOTIRIOU_BREAST_CANCER_GRADE_1_VS_3_UP) identified in WT- and dEBNA3B-LCLs. (I) Gene set enrichment analysis (PTEN_DN.V2_UP) suggesting possible downregulation of the PTEN pathway in dEBNA3B-LCLs. (J) Gene set enrichment analysis (RB_P107_DN.V1_UP) suggesting downregulation of the RB pathway in dEBNA3B-LCLs. FDR, false discovery rate; FWER, family-wise error rate; NES, normalized enrichment score.
Importantly, genes that displayed EBNA3B ChIP-seq peaks within 100 kb of their gene bodies demonstrated a significant bias toward upregulation in dEBNA3B LCLs (Figure 5D-E), consistent with the known role of EBNA3B as a transcriptional repressor.12 Gene set enrichment analysis further identified an enrichment of genes typically deleted in glioblastoma (TCGA_GLIOBLASTOMA_COPY_NUMBER_DN, false discovery rate = P = 9.7 × 10−5) or genes upregulated in grade 3 breast cancer (SOTIRIOU_BREAST_CANCER_GRADE_1_VS_3_UP, false discovery rate = 4.8 × 10−4) within the dEBNA3B LCLs (Figure 5F-G). Notably, key members of this gene set included PTEN (downregulated to 0.75-fold) and RB1 (downregulated to 0.40-fold), both of which are well-established tumor-suppressor genes.
Furthermore, we observed gene expression changes of target genes of PTEN (PTEN_DN.V2_UP) and RB1 (RB_P107_DN.V1_UP), in line with the downregulation of PTEN and RB1 themselves (Figure 5I-J). Remarkably, RB1 exhibited a strong EBNA3B ChIP-seq peak located 16 kb upstream of its transcriptional start site (supplemental Figure 5), reinforcing the view that EBNA3B might regulate RB1 and its downstream signaling pathways involved in cell cycle control and tumor suppression.
EBNA3B, carrying one of the identified convergent mutations (p.D195N), demonstrated a similar inhibitory effect as WT EBNA3B on the cell cycle and CXCL10 expression in dEBNA3B LCLs (supplemental Figure 6). Complementation of RB1 or PTEN in dEBNA3B LCLs also increased the proportion of cells in the G1 phase, with PTEN reducing downstream phospho-AKT levels (supplemental Figure 7).
Discussion
Our comprehensive analysis of 990 EBV genomes across diverse disease conditions provides several key insights. First, we identified distinct genomic variation patterns between hematological and epithelial malignancies, notably a significantly higher frequency of intragenic deletions in hematological malignancies (28%-48%) compared with epithelial cancers (5%). Second, although most SNVs exhibited conservative evolutionary patterns, we also identified convergent mutations in viral genes that seem essential for maintaining viral functions while potentially facilitating immune evasion. Finally, we confirmed that EBNA3B, frequently targeted by deletions across various EBV-associated diseases, likely acts as a tumor suppressor, thus highlighting its potential role in EBV-mediated oncogenesis.
Although most observed SNVs seem conservative, suggesting the maintenance of essential viral functions, the presence of convergent mutations under selective pressure indicates adaptive evolution, particularly in relation to immune evasion and host immune system variation.17 This aligns with the findings of Burrows et al,18 which demonstrated significant positive selection pressure on LMP1 sequences in Southeast Asian EBV isolates. The geographical variation of convergent mutations likely reflects adaptation to population-specific immune pressures, such as distinct HLA haplotype distributions or other immune-related genetic factors. This is supported by de Campos-Lima et al,19,20 who revealed the loss of HLA-A11 epitopes in EBV isolated from highly A11-positive populations. This local adaptation showcases the ability of EBV to fine-tune its antigenicity in response to the immunogenetic landscape of specific populations. Our comprehensive analysis of 990 EBV genomes further supports this adaptive evolution, revealing 6 distinct clusters: 5 driven by geographical origin and 1 by type 2 EBV. The clear geographic clustering, with 4 clusters predominantly comprising Asian genomes and 1 cluster representing Africa, Europe, and the Americas, demonstrates how EBV has evolved to adapt to different geographic populations. This geographic pattern is consistent with recent analyses of EBV genomes across multiple cancer types and geographical regions.21,22
Frequent homologous recombination events among EBV genomes, indicated by low linkage disequilibrium across the entire genome, suggest the occasional presence of ≥2 haplotypes of the EBV genome within a cell, allowing for recombination. This can be understood as a naturally acquired mechanism for the virus to increase genomic diversity, enabling persistent residence in the human population.23
We identified marked differences in deletion frequencies across various EBV-associated diseases, particularly between hematological and epithelial malignancies. Many deletions in hematological malignancies affected BART miRNA clusters and are thought to upregulate the reactivation program of the virus. We speculate that cancer cells of hematological origin are relatively restrictive to viral reactivation, whereas those of epithelial origin are relatively permissive and do not require the deletion of miRNAs that suppress the program. Previous studies have also documented these frequent focal EBV genome deletions, particularly affecting BART miRNA regions, as being more prevalent in NK/T-cell lymphoma.24
Notably, deletions rarely affected viral components which are to maintain the viral genome (EBNA1 and oriP), induce the lytic reactivation (BZLF1 and BRLF1), or evade immune attacks (BNLF2a and BPLF1). EBNA1 tethers the viral genome to the human genome to achieve synchronized genome replication that initiates at oriP; therefore, the 2 viral components are maintained in virtually all viral genomes analyzed. Compared with EBNA1 and oriP, the background of marked resistance of BZLF1 and BRLF1 against deletions is less obvious; however, they are essential initiators and promoters of the lytic cycle, and the resistance of these components is in line with frequent BART miRNA deletions suggesting tumor-promoting role of this viral cycle. BNLF2a promotes immune evasion by inhibiting transporter associated with antigen processing (TAP) and its deletion will be unfavorable for cancers.25 BPLF1, a viral deubiquitinase, suppresses Toll-like receptor signaling to evade from immunity.26 In addition, natural genetic variations within EBV latent proteins, such as EBNA1, have been functionally demonstrated to reduce antigenicity and alter immune recognition. For instance, polymorphisms in EBNA1 identified in patients with PTLD were found to significantly affect T-cell responses and immune recognition patterns, highlighting a direct link between viral genetic diversity and immune evasion.27 Furthermore, several other viral genes that may benefit neoplasms, for example BHRF1 (viral BCL-2 homolog) that favors cell survival, are also less frequently affected by deletions.28
These distributions of deletions demonstrate similarity with genomic deletions found in human cancers, in which tumor suppressor genes (such as TP53 and PTEN) are usually targeted in several cancer types and genes beneficial for cancers (oncogenes) are rarely targeted by deletions.
We suspect that the primary consequence of the frequent deletion in the viral genome is to upregulate the lytic cycle program. The BART miRNA clusters were the most frequently affected components across all diseases, with specific miRNAs demonstrating notable deletion frequencies such as ebv-mir-BART6, which is known to downregulate DICER1 (a key component of miRNA processing) and suppress viral lytic reactivation.15 The recurrent deletion of these specific BART miRNAs suggests their potential role as driver mutations in EBV-associated lymphomagenesis. Deletion of viral genes essential for virion production may also lead to the lytic cycle, as revealed by our previous publication.10 The resistance of BZLF1 and BRLF1 to deletions is in line with the previously mentioned hypotheses. EBNA3B deletions upregulate MYC, which may also upregulate the lytic cycle.29
We observed the mixture of a complete viral genome and a viral genome with a deletion. Such partial deletions may be a result of the presence of normal cells carrying a complete genome in a specimen, intratumoral heterogeneity (the mixture of cancer cells having a complete genome and having a genome with a deletion), or the presence of both complete genome and a genome with deletion within a single cell. We consider which of the 3 conditions is the most likely disease context dependent; for example, HL carries large number of noncancer B cells in tumor microenvironment and the normal B cells may give a complete allele. In the case of CAEBV, genomic DNA is extracted from T or NK cells and purified by fluorescence-activated cell sorting and B cell contamination will not be able to explain the complete genome. That means the latter 2 possibilities are likely. As CAEBV carry intratumoral heterogeneity in the context of somatic mutations in the human genome,10 viral genomes may also vary among tumor cells. In the case of partial deletion of EBNA1, we can strongly suspect the presence of 2 different viral genomes within a cell, otherwise the viral genome lacking EBNA1 will be lost. This observation suggests that cancer cells can maintain 2 different viral genomes at a time, which normal B or epithelial cells can occasionally do, as demonstrated by frequent homologous recombination of the viral genome.
Inversions frequently disrupted the connection between the C promoter plus EBNA-LP exons 1 to 2 and EBNA genes. This prevents the expression of EBNA genes from the C promoter, whereas these genes can be transcribed from other associated promoters (W, F, or Q promoters). Such compromised expression from the C promoter is also likely in viral genomes with C promoter deletions. Although the significance of C promoter deletions is still unclear,30 the presence and positioning of inversions also indicate an oncogenic role of the C promoter deletions. One possible role of the C promoter restriction is to limit the expression of EBNA genes; they are exogenous antigens and lower expression level is desirable. As demonstrated by partial EBNA1 deletions which reduce the gene’s copy number to 1 of 10, much lower EBNA1 expression can still hold the viral genome while maintaining low antigenicity.
We identified statistically significant large deletions within EBNA3B across a range of EBV-associated diseases. Our functional studies using lymphoblastoid cell lines provide compelling evidence for the role of EBNA3B as a tumor suppressor gene, possibly by suppressing RB1 and PTEN expression. Similar to other deletions, EBNA3B deletions are not transmissible, suggesting that EBNA3B is important for viral persistence in the human population.
In summary, our study reveals that EBV genomic variations, particularly deletions, reflect a sophisticated balance between viral reactivation, immune evasion, and cellular transformation. The selective preservation of immune evasion genes, coupled with targeted deletion of latency regulators and tumor suppressors, suggests a coordinated viral strategy in lymphomagenesis, providing new possibilities for therapeutic intervention.
Acknowledgments
The authors express their gratitude to all the clinicians, patients, and their families and caregivers who contributed to this study by providing samples. Special thanks are extended to Tomomi Atsumi, Mayumi Hojo, and Shuko Murakami for their invaluable assistance. The authors also acknowledge technical support from the Research Equipment Sharing Center/Core Laboratory at Nagoya City University. Computations were partially carried out on the National Institute of Genetics supercomputer at the Research Organization of Information and System National Institute of Genetics. The authors thank Yoshiyuki Suzuki (Graduate School of Science, Nagoya City University, Nagoya, Japan) and Kiyotaka Kuzushima (Chikusa Public Health Center, Nagoya, Japan) for their valuable comments and discussions.
The sincere appreciation of the authors goes to the Takeda Science Foundation, the Daiko Foundation, and the Nagoya Pediatric Cancer Fund for their financial support, which made this project possible. This work was further supported by the Japan Society for the Promotion of Science (Grants-in-Aid for Scientific Research) (grant number JP22H02878) and the Japan Agency for Medical Research and Development (grant number JP19ck0106517).
Authorship
Contribution: Y.O. designed and collected the samples, performed the research, analyzed the data, led the project, and wrote the paper; H.T.K. designed and analyzed the data, performed the research, and wrote the paper; Y.S., M.H., M.U., S.S., Y.K., T.W., M.N., A. Satou, H. Kataoka, Y.I., A. Sawada, S.K., J.-i.K., T.M., and H. Kimura provided the samples, analyzed the data, and wrote the paper; and A.I., H.A., A.N., K.G., K.O., and Y.A. analyzed the data and wrote the paper.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Yusuke Okuno, Department of Virology, Nagoya City University Graduate School of Medical Sciences, 1 Kawasumi, Mizuho-cho, Mizuho-ku, Nagoya 467-8601, Japan; email: yusukeo@med.nagoya-cu.ac.jp; and Hiroshi Kimura, Department of Virology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya 466-8550, Japan; email: hkimura@med.nagoya-u.ac.jp.
References
Author notes
The sequencing data from this study have been deposited in the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra; National Center for Biotechnology Information BioProject accession number PRJDB18873, which contains runs DRR598780-DRR599074).
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal