Background:

The 1000 Genomes Project provides a database of over 80 million genomic variants found across 2504 individuals from 26 populations. A current priority of the genomics field is to design information systems to translate this knowledge into clinical significance and patient care. The applications and advantages of red blood cell (RBC) antigen prediction through genotyping are widely accepted in transfusion medicine. Current technologies address a limited number of single nucleotide polymorphisms (SNPs) in 12 blood group genes, and our background knowledge of RBC phenotype distribution is often limited to a few populations. We analyzed the 1000 Genomes database with 4 objectives: 1) determine allele distributions of 46 blood group-related genes across the 5 genotyped superpopulations: Africa, East Asia, Europe, South Asia and the Americas; 2) identify possible new blood group alleles and their geographic association; 3) determine the feasibility of blood group genotyping by NGS; and 4) establish a scaffold of chromosomal coordinates to interpret NGS output files into a predicted RBC phenotype.

Results:

From the initial list of 46 blood group-related genes, we eliminated the five genes with known rearrangements and focused only on regions that met the strict criteria for accessibility through short, paired-end NGS reads (77% of 80.4kb). We mapped over 800 known alleles in coding and non-coding regions, and documented the 80 variants that were both present in the 1000 Genomes database and met the strict accessibility criteria. Sixty-four of these 80 variants are not addressed by current RBC genotyping technology. All 80 variants, including the ACKR1 promoter silencing mutation, are located within exon pull-down boundaries. The average low-coverage sequencing depth was 18,424x, with exome-sequencing confirmation at 65.7x depth. Twenty-three alleles had at least one novel population distribution, such as documentation of the Kpaallele for the first time in Africa and South Asia. From a total of 30 novel blood group continental frequencies, 14 correspond to a newfound presence in South Asia.

1000 Genomes identified a total of 926 missense mutations in blood group genes that met strict NGS mapping criteria, as well as multiple deletions. Two novel missense mutations in ERMAP and SLC14A1 are classified as likely antigenic, since they target the same amino acids responsible for the SCER- and Cr(a-) alleles. Six novel deletions involving the Lewis, H, Cromer, Indian and OK systems are also classified as likely-deleterious after careful analysis. For example, a novel in-frame 24bp deletion in SLC14A1 eliminates part of the intracytoplasmic tail, which is required for membrane localization and includes the 28G residue that defines JK*01W.03. Thus, this novel deletion is predicted to alter Kidd protein expression. The 8 novel alleles are distributed throughout the five superpopulations but are most frequently found in Africa. Four standard bioinformatics programs named SIFT, PolyPhen-2, Mutation Taster, and Mutation Assessor failed to detect half of the control known blood group alleles and thus are not adequate for the analysis of novel blood group variants in the transfusion medicine context.

Conclusions:

NGS can allow comprehensive, fast, and high-throughput RBC antigen prediction. All queried blood group alleles are amenable to targeted exome sequencing, and 77% of blood group coding sequences can be addressed with a short, paired-end NGS strategy. Based on 1000 Genomes, we created a database of the worldwide distribution of 80 known and 8 novel blood group variants, along with their chromosomal coordinates in the hg19 and GRCh38 assemblies. This database is the scaffold for the creation of a new transfusion medicine bioinformatics pipeline that will translate NGS .vcf output files into a predicted RBC phenotype. New algorithms that focus on exposed peptides and antigenicity are required for the analysis of novel variants identified by NGS in the immunohematology context.

Disclosures

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

This icon denotes a clinically relevant abstract

Sign in via your Institution