INTRODUCTION

Mass spectrometry (MS) is a potentially useful tool for the study of the hemostasic system and its imbalances that lead to bleeding and thrombotic disorders. By harnessing the high throughput and broad scope of MS, vast data may be available to investigators and clinicians to help predict or manage hemostatic events. Although utilizing MS to evaluate coagulation proteins appears promising, amino acid (AA) substitutions resulting from genetic variation may yield a spectrum of mass-to-charge ratios (m/z) that can impede accurate protein identification. The goals of this study were to describe 1) the proteins present in a blood sample that might be involved in or otherwise affect coagulation, 2) the realm of variations that might occur with a single nucleotide substitution (SNS) in the reference coding sequences of these proteins, and 3) the variation of peptide fragments of these proteins when only one of the nucleotides is a variant.

METHODS

We obtained protein lists from the NCBI BioSystems database for the terms: Blood Clotting Cascade, Complement Cascade, Formation of Fibrin Clot, Hemostasis, Platelet Activation, Platelet Aggregation Plug Formation, Platelet Degranulation, Platelet Homeostasis, and Thrombin Signaling (e.g., http://www.ncbi.nlm.nih.gov/biosystems/198840). We linked the Symbol (gene) to the CCDS ID (consensus coding sequence, http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi ). For each nucleotide, we enumerated the effect of a SNS relative to the other three nucleotides. We then generated every peptide of length 5-20, determined the change in mass based on the average, as opposed to the monoisotopic, mass of the substituted AA, and assessed whether the peptide was unique among those in the system. Finally, we determined whether possible N-linked glycosylation sites were preserved, destroyed, or created by SNS. We considered a putative site N[^P][S|T], that is N in position 1, not P in position 2, and either a S or T in position 3.

RESULTS

The proteins in the biosystems that were also in the CCDS database comprised 517 distinct Symbols and 951 distinct CCDS ID's, comprising 2,180,352 codons. The duplicate Symbols include transcript variants, for instance, the Symbol F8 linked to CCDS ID's 35457.1 and 44026.1, thereby diminishing the uniqueness of the peptides (transcript variants share some, if not most, of the reading frames). All of the codons were susceptible to an AA substitution; at least one variant nucleotide substation in position 1 or 2 of the codon always resulted in an AA substitution. SNS caused premature termination signals (stop codons) in 240,100 of these codons. Table 1 details the variations. A map of the N-glycosylation sites is available for each protein, although this may not affect MS directly. Of the 83,510 potential N-Linked Glycosylation sites, a SNS disrupted the putative AA sequence in 53,067 (64%). A SNS created a novel potential N-Linked Glycosylation site at 52,787 loci.

Table 1.
Wild-type OnlyWild-type and Variants
Peptide Length Peptides Distinct Peptides Peptides Distinct Peptides Relative Change in Mass 
722,174 292,485 24,470,567 2,464,873 0.052250 
10 717,564 350,570 47,911,535 20,982,641 0.025968 
15 712,954 355,446 71,052,562 31,777,153 0.017273 
20 708,344 357,177 93,893,423 42,465,377 0.012939 
Wild-type OnlyWild-type and Variants
Peptide Length Peptides Distinct Peptides Peptides Distinct Peptides Relative Change in Mass 
722,174 292,485 24,470,567 2,464,873 0.052250 
10 717,564 350,570 47,911,535 20,982,641 0.025968 
15 712,954 355,446 71,052,562 31,777,153 0.017273 
20 708,344 357,177 93,893,423 42,465,377 0.012939 

CONCLUSIONS

Variant peptides due to a single SNS per peptide greatly outnumber wild-type peptides. The ability to identify a protein based on uniqueness of one of its peptides increases as the peptide size increases, but AA variations in those peptides that arise from one SNS will require 1) increased mass resolution and 2) both a search algorithm and database that accounts for the possible variations. Patients with hemostatic or thrombotic disorders may be more likely to have a variant, and these results highlight the need to know the genetic sequence associated with proteins being analyzes by MS if this technology is to be adopted for research and clinical purposes. The inclusion of currently identified SNPs and the effect of INDELs that preserve the reading frame is ongoing.

Disclosures

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution