Background/Case Studies

Hybrid RH alleles are common structural variants (SVs) that evolved from gene conversion between the highly homologous and closely located RHD and RHCE genes. In hybrids RHD*03N.01/ RHD*DIIIa-CEVS(4-7)-D and RHD*01N.07/RHD*D-CE (4-7)-D, RHD exons 4-7 and surrounding intronic regions are replaced by the corresponding RHCE sequence. As the RHD hybrid alleles encode partial C, particularly RHD*DIIIa-CEVS(4-7)-D is frequently found in patients with sickle cell disease (SCD), developing an accurate and cost-effective assay for detection could facilitate red cell matching and reduce transfusion-associated alloimmunization. New DNA sequencing technologies can generate long reads that allow direct detection of SVs. We investigated the ability of Pacific Bioscience (PacBio) long-read sequencing to define the breakpoints of RHD hybrids and to identify single nucleotide polymorphisms (SNPs) that can be used for detecting the hybrid alleles.

Study Design/Methods

We performed PacBio whole genome sequencing (WGS) to identify the breakpoints of RHD*DIIIa-CEVS(4-7)-D in a patient (P1). To verify the breakpoints, targeted-PacBio sequencing with long-range PCR products of RH introns 3 and 7 containing the breakpoints of RHD*DIIIa-CEVS(4-7)-D was performed in P1 and a 2nd patient (P2) with the hybrid allele, and two patients without identified structural variations as controls. We also investigated a patient (P3) with the rare allele, RHD*D-CE(4-7)-D by the same targeted-PacBio approach. PacBio sequencing reads were aligned to the RHD and RHCE reference sequences (hg38) with Minimap2 for haplotype reconstruction. Next, haplotypes were iteratively grouped by similarity using the klaR package's kmodes function to generate cluster consensus sequences (contigs). The contigs were mapped to the reference RHD and RHCE sequences by Mafft multiple sequence alignment and breakpoints were identified via binary segmentation of match/mismatches against the RH references using Ruptures (arXiv:1801.00826). Sanger sequencing on the breakpoint regions of RHD*DIIIa-CEVS(4-7)-D in P1, P2 and five additional patients with the hybrid allele identified shared intronic SNPs. To determine if the shared SNPs were unique to RHD*DIIIa-CEVS(4-7)-D, we analyzed the short-read WGS data from 912 SCD patients with RH genotypes predicted by RHtyper (Ti-Cheng Chang et al., 2020).

Results/Findings

PacBio WGS in P1 showed that the 5' breakpoint for RHD*DIIIa-CEVS(4-7)-D was located 3.16 kb downstream of exon 3 (g.25293960-25293980), and the 3' breakpoint at 3.05 kb downstream of exon 7 (g.25309776-25309828). Targeted PacBio sequencing in P1 and P2 with the hybrid allele showed that the 5' breakpoints were 3.04 kb (g. 25293838-25293892) or 3.10 kb (g. 25293892-25293908) downstream of exon 3; the 3' breakpoints were identical in the two patients and as found by PacBio WGS. As expected, the targeted PacBio sequencing did not identify exon 4-7 conversion in two control patients without structural variations. Notably, the breakpoints were similar to those reported by Zhang et al (5' breakpoint, g.25293908-25293979; 3' breakpoint, g. 25310218-25310250) who fully assembled the RH locus by a combination of targeted DNA capture, long-read PacBio sequencing and a customized bioinformatics pipeline. We used the same targeted PacBio sequencing approach to analyze P3 with the rare RHD hybrid allele, RHD*D-CE(4-7)-D. The 5' breakpoint was 3.84 kb downstream of exon 3 (g. 25294634-25294895) and the 3' breakpoint was identical to that of RHD*DIIIa-CEVS(4-7)-D. In 7 patients, we identified by Sanger sequencing six shared RHD intronic SNPs of RHD*DIIIa-CEVS(4-7)-D. Among 912 patients with SCD and RH genotypes predicted by RHtyper (including 55 patients with RHD*DIIIa-CEVS(4-7)-D), three of the six SNPs (g.25293603G>A, 25293891A>G, and 25310934T>C) showed high sensitivities (94.74%, 78.95% and 96.49%) and specificities (99.77%, 99.88% and 99.77%) for detecting RHD*DIIIa-CEVS(4-7)-D. Of interest, one of the SNPs, 25293891A>G was previously reported by Silvy et al (IVS3+3100 or c.486+3100A>G) to be unique for RHD*DIIIa-CEVS(4-7)-D.

Conclusions

PacBio long-read sequencing defined precise breakpoints of RHD*DIIIa-CEVS(4-7)-D and RHD*D-CE(4-7)-D. Notably, we discovered three intronic RHD SNPs that can be used to predict RHD*DIIIa-CEVS(4-7)-D, providing a low-cost method for detection.

Disclosures

Weiss:Cellarity Inc., Novartis, and Forma Therapeutics: Membership on an entity's Board of Directors or advisory committees.

This content is only available as a PDF.
Sign in via your Institution