Figure 1
Retroviral integration clusters in the human genome. (A) Clustered distribution of 32 631 MLV and 28 382 HIV integration sites (ISs) in the genome of human CD34+ HPCs. The minimal requirement for cluster definition was 3 integrations within 12 587 bp for MLV and 14 460 bp for HIV, a threshold statistically associated to a false discovery rate of 0.01 in a control population of random sites of the same size (see supplemental Figure 3 for definition). (n) indicates the total number of clusters identified with this threshold. Clusters belonging to the upper 5% of the distribution, containing ≥ 15 integrations for both MLV and HIV, were named “hyperclusters.” (B) Density distribution plot of MLV and HIV clusters. Cluster density is defined as the average distance between integrations within a cluster, calculated by dividing the cluster size (in bp) by the number of integration sites contained in the cluster. (C) Distribution of MLV, HIV, and random integration sites with respect to Known Genes (UCSC definition). (D) Intragenic distribution of 16 342 MLV and 21 647 HIV integrations along target transcripts from the transcription start site (TSS) to the last nucleotide (end) on a normalized scale arbitrarily broken down in 50 bins. The black line indicates the distribution of 15 263 control random sites.

Retroviral integration clusters in the human genome. (A) Clustered distribution of 32 631 MLV and 28 382 HIV integration sites (ISs) in the genome of human CD34+ HPCs. The minimal requirement for cluster definition was 3 integrations within 12 587 bp for MLV and 14 460 bp for HIV, a threshold statistically associated to a false discovery rate of 0.01 in a control population of random sites of the same size (see supplemental Figure 3 for definition). (n) indicates the total number of clusters identified with this threshold. Clusters belonging to the upper 5% of the distribution, containing ≥ 15 integrations for both MLV and HIV, were named “hyperclusters.” (B) Density distribution plot of MLV and HIV clusters. Cluster density is defined as the average distance between integrations within a cluster, calculated by dividing the cluster size (in bp) by the number of integration sites contained in the cluster. (C) Distribution of MLV, HIV, and random integration sites with respect to Known Genes (UCSC definition). (D) Intragenic distribution of 16 342 MLV and 21 647 HIV integrations along target transcripts from the transcription start site (TSS) to the last nucleotide (end) on a normalized scale arbitrarily broken down in 50 bins. The black line indicates the distribution of 15 263 control random sites.

Close Modal

or Create an Account

Close Modal
Close Modal