Figure 3.
Figure 3. Methylation analysis of gene bodies and surrounding regions. (A, top) Meta-gene analysis illustrating gene body methylation as a function of expression. The average percentage of methylated CpG dinucleotides of aggregated signal from all genes with indicated coordinates (in kb) relative to the TSS or TES is shown for unexpressed genes (Q1) and for expressed genes in tertiles of expression (Q2-Q4). Gene bodies of expressed genes were highly methylated in all cells. Gene bodies of silent genes were highly methylated in cells composed mostly of HMDs (SPCs) but were partially methylated in cells rich in PMDs (differentiated and transformed cells). Level of gene-body methylation of silent genes was cell type–specific and similar to the average level of methylation of the PMDs. Methylation levels dramatically dropped near the promoters reflecting the presence of UMRs and LMRs in these regions. (A, bottom) Box plots depicting average gene body methylation (starting at 30% of the gene length after the TSS [to eliminate the effect of the presence of UMRs/LMRs] and ending at the TES) of the genes from the 4 expression quartiles. Statistical significance assessed by Student t test is indicated (***P < .001). (B) Scatter plots illustrating Pearson r2 correlation between the difference of maternal and paternal gene expression in individuals FNY01_3_2 and 3_3 in the haplo-identical but not in the nonidentical fraction of their genomes. The red dots represent the highly expressed genes (>500 RNA-seq reads); black dots, genes with >20 RNA-seq reads. Only autosomal protein-coding genes were considered. x-axis and y-axis represent allele-specific expression defined as (βmaternal − βpaternal) with βmaternal = (# of maternal reads)/(# of total reads) and βparternal = (# of paternal reads)/(# of total reads). The high correlation in the haplo-identical genes (r2 = 0.754 for the genes >500 reads and r2 = 0.361 for the genes >20 reads) suggests that most of the allele-specific variation in gene expression detected at the cell population level is genetic in origin. (C) Allele-specific meta-gene body methylation analysis illustrating that the most highly expressed allele is more highly methylated than the least expressed allele. The top and bottom graph respectively represent gene body methylation for the genes expressed in an allele-specific manner, or in a non–allele-specific manner in FNY01_3_2 and FNY01_3_3. The difference between the 2 curves in the top graph was statistically significant (paired Student t test between the average gene body methylation of the most expressed allele vs least expressed allele P < .0006 for both individuals separately, and P < 1.2×10E-09 combined). Autosomal protein-coding genes exhibiting at least a twofold difference in allele expression were analyzed. Genes with <20 RNA-seq reads and <100 methylation counts were filtered out yielding a list of 26 and 34 genes for FNY01_3_2 and FNY01_3_3, respectively. The dotted line represents the raw data (the averaged methylation per windows joined by lines); the continuous lines the loess smooth of the same data. The meta-gene analysis was performed as described in panel A. (D) Scatter-plot illustrating the relationship between allele-specific expression and methylation in FNY01_3_2 and FNY01_3_3. Data processing and definition of allele-specific expression are as in panels B and C. Allele-specific methylation is defined as (βmaternal − βpaternal) with β = (# of methylated reads)/(# of total reads). (E) Meta-plots illustrating the progressive decay of DNA methylation in the sequences flanking active gene bodies (at position 0). The methylation fraction for 1-kb windows of 5′ and 3′ flanking sequences was averaged to generate the plots (see “Methods”). The x-axis represents the distance in kilobytes of each window to either the TSS (negative numbers) or to the TES (positive number). The black dots represent the raw data; the red curve represents the kernel regression smooth of the same data. The inverted spikes in the middle are caused by the UMRs that are present near the promoters of each gene. For both the 3′ and 5′ flanking sequences, the methylation fraction was maximal in the gene body and progressively decreased in the flanking sequences until it reached the average methylation level that is typical of the PMDs for each cell type.

Methylation analysis of gene bodies and surrounding regions. (A, top) Meta-gene analysis illustrating gene body methylation as a function of expression. The average percentage of methylated CpG dinucleotides of aggregated signal from all genes with indicated coordinates (in kb) relative to the TSS or TES is shown for unexpressed genes (Q1) and for expressed genes in tertiles of expression (Q2-Q4). Gene bodies of expressed genes were highly methylated in all cells. Gene bodies of silent genes were highly methylated in cells composed mostly of HMDs (SPCs) but were partially methylated in cells rich in PMDs (differentiated and transformed cells). Level of gene-body methylation of silent genes was cell type–specific and similar to the average level of methylation of the PMDs. Methylation levels dramatically dropped near the promoters reflecting the presence of UMRs and LMRs in these regions. (A, bottom) Box plots depicting average gene body methylation (starting at 30% of the gene length after the TSS [to eliminate the effect of the presence of UMRs/LMRs] and ending at the TES) of the genes from the 4 expression quartiles. Statistical significance assessed by Student t test is indicated (***P < .001). (B) Scatter plots illustrating Pearson r2 correlation between the difference of maternal and paternal gene expression in individuals FNY01_3_2 and 3_3 in the haplo-identical but not in the nonidentical fraction of their genomes. The red dots represent the highly expressed genes (>500 RNA-seq reads); black dots, genes with >20 RNA-seq reads. Only autosomal protein-coding genes were considered. x-axis and y-axis represent allele-specific expression defined as (βmaternal − βpaternal) with βmaternal = (# of maternal reads)/(# of total reads) and βparternal = (# of paternal reads)/(# of total reads). The high correlation in the haplo-identical genes (r2 = 0.754 for the genes >500 reads and r2 = 0.361 for the genes >20 reads) suggests that most of the allele-specific variation in gene expression detected at the cell population level is genetic in origin. (C) Allele-specific meta-gene body methylation analysis illustrating that the most highly expressed allele is more highly methylated than the least expressed allele. The top and bottom graph respectively represent gene body methylation for the genes expressed in an allele-specific manner, or in a non–allele-specific manner in FNY01_3_2 and FNY01_3_3. The difference between the 2 curves in the top graph was statistically significant (paired Student t test between the average gene body methylation of the most expressed allele vs least expressed allele P < .0006 for both individuals separately, and P < 1.2×10E-09 combined). Autosomal protein-coding genes exhibiting at least a twofold difference in allele expression were analyzed. Genes with <20 RNA-seq reads and <100 methylation counts were filtered out yielding a list of 26 and 34 genes for FNY01_3_2 and FNY01_3_3, respectively. The dotted line represents the raw data (the averaged methylation per windows joined by lines); the continuous lines the loess smooth of the same data. The meta-gene analysis was performed as described in panel A. (D) Scatter-plot illustrating the relationship between allele-specific expression and methylation in FNY01_3_2 and FNY01_3_3. Data processing and definition of allele-specific expression are as in panels B and C. Allele-specific methylation is defined as (βmaternal − βpaternal) with β = (# of methylated reads)/(# of total reads). (E) Meta-plots illustrating the progressive decay of DNA methylation in the sequences flanking active gene bodies (at position 0). The methylation fraction for 1-kb windows of 5′ and 3′ flanking sequences was averaged to generate the plots (see “Methods”). The x-axis represents the distance in kilobytes of each window to either the TSS (negative numbers) or to the TES (positive number). The black dots represent the raw data; the red curve represents the kernel regression smooth of the same data. The inverted spikes in the middle are caused by the UMRs that are present near the promoters of each gene. For both the 3′ and 5′ flanking sequences, the methylation fraction was maximal in the gene body and progressively decreased in the flanking sequences until it reached the average methylation level that is typical of the PMDs for each cell type.

Close Modal

or Create an Account

Close Modal
Close Modal