Abstract
Plasma factor VIII coagulant activity (FVIII:C) level is a highly heritable quantitative trait that is strongly correlated with thrombosis risk. Polymorphisms within only 1 gene, the ABO blood-group locus, have been unequivocally demonstrated to contribute to the broad population variability observed for this trait. Because less than 2.5% of the structural FVIII gene (F8) has been examined previously, we resequenced all known functional regions in 222 potentially distinct alleles from 137 unrelated nonhemophilic individuals representing 7 racial groups. Eighteen of the 47 variants identified, including 17 single-nucleotide polymorphisms (SNPs), were previously unknown. As the degree of linkage disequilibrium across F8 was weak overall, we used measured-genotype association analysis to evaluate the influence of each polymorphism on the FVIII:C levels in 398 subjects from 21 pedigrees known as the Genetic Analysis of Idiopathic Thrombophilia project (GAIT). Our results suggested that 92714C>G, a nonsynonymous SNP encoding the B-domain substitution D1241E, was significantly associated with FVIII:C level. After accounting for important covariates, including age and ABO genotype, the association persisted with each C-allele additively increasing the FVIII:C level by 14.3 IU dL−1 (P = .016). Nevertheless, because the alleles of 56010G>A, a SNP within the 3′ splice junction of intron 7, are strongly associated with 92714C>G in GAIT, additional studies are required to determine whether D1241E is itself a functional variant.
Introduction
Factor VIII (FVIII) circulates, bound to von Willebrand factor (VWF), as an inactive heterodimer with domain configuration A1-A2-B (heavy chain) and A3-C1-C2 (light chain). Activated FVIII (FVIIIa), which is released from VWF as a heterotrimer with A1, A2, and A3-C1-C2 subunits, binds the protease FIXa to form intrinsic Xase whose sole function is to catalyze cleavage activation of FX. Intrinsic Xase is regulated by several processes including FVIIIa inactivation through A2-subunit dissociation and cleavage by activated protein C (aPC). Although FVIII coagulant activity (FVIII:C) is deficient in hemophilia-A patients, individuals with elevated levels have an increased risk for both venous1–3 and arterial thrombosis.2,4
FVIII levels in nonhemophilic populations are broadly variable, spanning a greater than 5-fold range.5,6 Age,7–12 sex,7,9 oral contraception (OC),10,13 smoking,7,14,15 body-mass index (BMI),7,12,16,17 diabetes mellitus (DM),7,16 ABO blood type,17–22 and plasma levels of total cholesterol (TC),16,17 low-density lipoprotein (LDL),16,17,23 triglyceride (TG),16,17 VWF,24–29 and FIX coagulant activity (FIX:C)10,24,30 all have ample evidence supporting their role as correlates of FVIII:C. Family studies, which have yielded heritability estimates ranging between 40% and 57% and 20% and 61%, respectively, for the levels of FVIII:C21,31 and FVIII antigen (FVIII:Ag),32–34 also demonstrate a substantial genetic contribution to FVIII variance. Furthermore, the finding by Souto et al35 of a strong genetic correlation between FVIII:C level and thrombosis (ρg = 0.689; P < .001) suggests that a subset of polymorphisms underlying FVIII variability pleiotropically influence thrombosis risk. Supporting this, Soria et al36 discovered a region on chromosome 18 (chr18) that contributes to the variance in both FVIII:C and aPC resistance (aPC-R), an established pathophysiologically related thrombosis risk factor. However, this putative FVIII quantitative-trait locus (QTL) and 2 other potential determinants localized to regions on chr5 and chr11 in the first reported genome-wide screen for which FVIII:C was the primary phenotype37 have yet to be confirmed. In a family-based study of the ABO structural gene, the only well-established FVIII QTL,17–22 Souto et al22 applied combined tests for linkage and association to demonstrate that the polymorphisms encoding allelic glycosyltransferases responsible for ABO blood-group antigens also directly influence FVIII:C levels and represent the only known FVIII quantitative-trait nucleotides (QTNs).
While approximately 1000 loss-of-function FVIII gene (F8) alleles underlie the heterogeneity observed for FVIII:C in hemophilia A,38 only 1 F8-based QTN contributing to the FVIII variability in nonhemophilic individuals has been identified39 and confirmed40 despite numerous studies.25,36,37,39,41–45 The X-linked F846 is thought to be less variable than other genes,6 in part because no polymorphisms have been found in prior investigations of this candidate FVIII determinant.25,41–43,45 Mansvelt et al41 identified no promoter variants in subjects with high FVIII levels. Regulatory regions required for F8 expression in vivo remain poorly characterized despite findings from in vitro assays demonstrating maximal transcription with less than 500 bp of the 5′ genomic sequence.47,48 Morange et al25 also found no variants in F8 regions encoding residues implicated in FVIII clearance by the lipoprotein receptor–related protein (LRP). Furthermore, studies of idiopathic thrombophilia subjects, with or without aPC-R, have not identified polymorphisms within F8 regions encoding aPC-cleavage sites,42,43,45 in contrast to the aPC-R variants frequently found within homologous F5 gene regions in thrombosis patients.49,50 Finally, while a subset of thrombophilic individuals with high prothrombin levels carry the prothrombin gene 3′ untranslated region (UTR) mutation 20210G>A,51,52 Mansvelt et al41 found no polymorphisms within this region of F8 in thrombosis patients with elevated FVIII levels. However, these studies scanned only small segments of F8 and often used screening methods less sensitive than DNA sequencing.41
Since loss-of-function hemophilic variants have been identified within every exon and junctional intronic sequence, multiple F8 regions are likely essential for FVIII expression and/or function in vivo.38 Future attempts to identify FVIII QTNs in F8 should examine every functional region by direct sequencing in a large sample of unrelated healthy subjects. While 98 predominantly unknown F8 polymorphisms were recently identified in a scan that met most of these criteria, subsequent analyses for associations with FVIII:C were not possible, as no phenotypic data were available for these subjects.53 To determine whether F8 is a determinant of FVIII variability, we resequenced all known functional regions within a collection of 222 X chromosomes from 137 unrelated subjects. Next we genotyped the polymorphisms identified in the 398 subjects of the Genetic Analysis of Idiopathic Thrombophilia project (GAIT).31 Finally, we used association analysis to determine if GAIT subjects have genotype-specific differences in mean FVIII:C levels after adjusting for relevant covariates.
Patients, materials, and methods
Reagents and instruments
HotStarTaq-Master Mix was from QIAGEN (Valencia, CA); AmpPure and CleanSEQ were from Agencourt Biosciences (Beverly, MA); polymerase chain reaction (PCR) and DNA-sequencing oligonucleotides (oligos) were from Invitrogen (Carlsbad, CA); and BigDye-Terminator (v3.1 or v1.1) Cycle-Sequencing Kits and ABI Prism-3100 and -3700 automated sequencers were from ABI (Foster City, CA). SeqMan multiple-sequence alignment program was from DNASTAR (Madison, WI); PHRED (v020425.c)54,55 was a gift from the University of Washington; and FINCH DNA-Sequencing System (v.2.09) was from Geospiza (Seattle, WA). Reaction conditions and thermal-cycling parameters for all PCR and DNA sequencing are available upon request.
Subjects and phenotypes
(1) GAIT study subjects. All procedures were approved by the institutional review board (IRB) of the Hospital de la Santa Creu i Sant Pau (Barcelona, Spain). Adult subjects gave informed consent for themselves and their minor children. Coded archived DNA samples from these subjects as well as previously obtained data for measured variables, including FVIII:C levels, were used. (2) Emory study subjects. All studies performed on these DNA specimens were approved by the IRB of Emory University (Atlanta, GA). Archived coded DNA samples from these subjects, which were collected for an unrelated study, were the only biologic specimens used. (3) Coriell study subjects. These subjects represent a collection from the National Institute of General Medical Sciences (NIGMS) of commercially available genomic-DNA samples from unrelated healthy subjects and different ethnic backgrounds that we obtained through the Coriell Cell Repository.
Studies on samples from the 3 groups of subjects referred to as GAIT, Coriell, and Emory were approved by the Emory University IRB. GAIT is composed of 398 Spanish white (SW) individuals from 21 extended pedigrees.31 The recruitment, sampling, and phenotypes measured in GAIT, including FVIII:C levels and additional measured variables, have been extensively described.31,35 Coriell designates a collection of genomic DNAs from NIGMS human cell panels representing 45 unrelated healthy subjects, including 10 female African American (AA), 3 female and 7 male Chinese (Ch), 5 female and 2 male Japanese (J), 2 female and 3 male Mexican Indian (MI), 3 female and 2 male South American Andean (SAA), and 5 female and 5 male non-J/non-Ch Southeast Asian (SEA) individuals, which were obtained from Coriell (Camden, NJ); catalog numbers available upon request. Emory designates genomic DNAs from 24 unrelated nonthrombotic male subjects, a subset of an archived collection originally obtained for an unrelated study,56 of which 18 are white American (WA) and 6 AA. The variation discovery group (VDG), which contained 137 unrelated subjects (85 female, 52 male) from 7 racial groups, included all Coriell and Emory individuals and a subset of 68 GAIT founders (58 female, 10 male) with FVIII:C levels at the extremes. In the VDG, “white” designates both SW and WA subjects.
F8 reference sequence
We used the May 2004 human genome (hg) assembly to download sequence data for a 286-kb stretch of DNA from Xq28.1 that contained the entire F8 (∼186 kb) plus 50-kb segments of contiguous DNA flanking both the 5′ and 3′ ends.57 Referred to as hg17, this was the first version (National Center for Biotechnology Information [NCBI] Build-35) containing all 26 exons of F8; specifically, it included exons 21 and 22, which were missing from earlier releases. This sequence is contained in GenBank accession NG_005114 and can be used to cross-reference oligos (Table 1) and polymorphisms (Table 2). To define polymorphisms with respect to the F8 transcription unit (TU), we used the reverse compliment of these sequences, assigned 1 to the start site mapped by Mansvelt et al41 and − 1 to the base immediately 5′ to it. (Figure 1). In this report, we use “hg17” to indicate that the nucleotide (nt) numbering follows this convention.
Amplicon* . | Forward primer . | Reverse primer . | Nucleotides† . |
---|---|---|---|
01 | 5′ -TATCAAAGGGGCTTCTTGC | 5′ -CATGCCCTTTCTCCTGACC | −1214 to −573 |
02 | 5′ -AGCAAGTGTTGAGGTCCAGG | 5′ -TGAAGTAGCAAAAGGGAGGC | −683 to −56 |
03 | 5′ -CTTCTCCATCCCTCTCCTCC | 5′ -CAGAAATGTTTCTTTGGGGC | −168 to 441 |
04 | 5′ -CTTCAAATTTGCCTCCTTGC | 5′ -AGACCAAGCAGAGGAAGACG | 23012 to 23436 |
05 | 5′ -AATCTTGCCTCAGAGCAACC | 5′ -GAAAAGCAATTCCTAGGGGG | 25450 to 26066 |
06 | 5′ -GGGCAACAGAGTGAGACTCC | 5′ -TTCTGGAACTCAGCTCCTCC | 29413 to 30016 |
07 | 5′ -GGAGACCTGACATCAAAGCC | 5′ -AACCCCATCTCCTTCATTCC | 35266 to 35608 |
08 | 5′ -TAAGGTGTGAGCACACTGGG | 5′ -CGATGAGTTCTGTTCTGAGCC | 37821 to 38404 |
09 | 5′ -ATGGTGATTGGTGACCTTGG | 5′ -GGAAACTAGGGGATCTTGGC | 52986 to 53544 |
10 | 5′ -GTCTTGCTCCTGCTTTCACC | 5′ -TACCCTTGCCATTTGATTCC | 55802 to 56428 |
11 | 5′ -CTGCTGAAGAGGAGGACTGG | 5′ -ATGTCCATTGGAGACAAGGC | 56244 to 56855 |
12 | 5′ -GATTGTGGTATCTGCAGGGG | 5′ -CAACAGCTGGAGAAAGGACC | 61345 to 61753 |
13 | 5′ -TGACACTTTCACAGTCAACCG | 5′ -CAGCAGGCACGTTTACTACG | 65333 to 65920 |
14 | 5′ -CAGTCACCCTCTTGTCCTGG | 5′ -GGGAATTAAAAGGGAGAGGG | 68456 to 69067 |
15 | 5′ -CCTGGGAATAAGATAATGGGC | 5′ -AAATGCTGGTGAGGATGTGG | 74699 to 75338 |
16 | 5′ -ACAGCAGCAATGCAAAAACC | 5′ -TCTATTGCTCCAGGTGATGG | 90867 to 91468 |
17 | 5′ -ATGCTCTTGCGACAGAGTCC | 5′ -AACAAAGCAGGTCCATGAGC | 91365 to 91942 |
18 | 5′ -TTGGCAAAAAGTCATCTCCC | 5′ -CTAATTGCTTTGGACTGGGG | 91756 to 92379 |
19 | 5′ -CCACCAGATGCACAAAATCC | 5′ -TTTGCTTGGTTTGATTTCCC | 92247 to 92850 |
20 | 5′ -GAAGGTTCATATGAGGGGGC | 5′ -ATGACTGCTTTCTTGGACCC | 92700 to 93290 |
21 | 5′ -TCTGACCAGGGTCCTATTCC | 5′ -CATGATTGCTTTCACAAGCG | 93200 to 93816 |
22 | 5′ -ATTGGATCCTCTTGCTTGGG | 5′ -TGTCCCTGATTCCTCTACCC | 93674 to 94323 |
23 | 5′ -ATGCAAAATGCTTCTCAGGC | 5′ -AAAAGCTTGTTCAAAATAAATGG | 116050 to 116647 |
24 | 5′ -TCTGTACCACTTCTTCCAGGG | 5′ -TTTATGCCAGTCCAACCTGC | 117559 to 118125 |
25 | 5′ -TATTTTTGGAAGGTGGGAGG | 5′ -CGAATCCTTTGATCCTGAGC | 118087 to 118699 |
26 | 5′ -TTGATGAGACCAAAAGCTGG | 5′ -AGAGCATGGAGCTTGTCTGC | 118318 to 118933 |
27 | 5′ -AAGCACTTTGCATTTGAGGG | 5′ -TGGAGATCTTCGAGCTTTACC | 120414 to 120947 |
28 | 5′ -GGACCCCAGTTTCTTCAGC | 5′ -AGTGGGAAGTGGAGAGGAGG | 121066 to 121510 |
29B | 5′ -GAATTTAATCTCTGATTTCTCTAC | 5′ -GAGTGAATGTGATACATTTCCC | 122740 to 122902 |
30B | 5′ -TAAAAATAGGTTAAAATAAAGTG | 5′ -TTTAAATGACTAATTACATACCA | 126453 to 126668 |
29A | 5′ -TCAGGGTTGGTTACTGGAGC | 5′ -ACACTACCATGGTCTTGGGG | 158245 to 158735 |
30A | 5′ -AGTCAGTGGGCCTGTTATGG | 5′ -GTCCCTAGCTCTTGTTCCCC | 158526 to 159105 |
31 | 5′ -TGGGCAGATAGGGATAGTGG | 5′ -TTTGTGCGTTTCTCAACAGC | 158833 to 159409 |
32 | 5′ -TTCCCACTTCTTCTTGGTGC | 5′ -TGGGCATTTAGGTTGACTCC | 159304 to 159934 |
33 | 5′ -TCATGCCACTACACTCCAGC | 5′ -CTGCCCATAACCAAACTTCC | 160633 to 161150 |
34 | 5′ -GGGTGACAGAGCAAGACTCC | 5′ -AAAAGGCTTGGGAATCAAGG | 161982 to 162549 |
35 | 5′ -AGATGTCCCAGATGCGTAGG | 5′ -GCTTTCATGCAGGTTTCTCC | 184818 to 185411 |
36 | 5′ -TATTTTCTGCAGCTGCTCCC | 5′ -CTTTCAACAATTGCATCCTCC | 185329 to 185941 |
37 | 5′ -GAGGGGCACATTCTTATCTCC | 5′ -TCATAGTGAAGGGGTCAGGC | 185733 to 186382 |
38 | 5′ -CACCACACAATAGGATCCCC | 5′ -GTCAATGGGAAAAGAATGCC | 186289 to 186832 |
39 | 5′ -CAATCCACAAATGATGCAGG | 5′ -AGTGCCAGGATTACAGGCAT | 186639 to 187259 |
Amplicon* . | Forward primer . | Reverse primer . | Nucleotides† . |
---|---|---|---|
01 | 5′ -TATCAAAGGGGCTTCTTGC | 5′ -CATGCCCTTTCTCCTGACC | −1214 to −573 |
02 | 5′ -AGCAAGTGTTGAGGTCCAGG | 5′ -TGAAGTAGCAAAAGGGAGGC | −683 to −56 |
03 | 5′ -CTTCTCCATCCCTCTCCTCC | 5′ -CAGAAATGTTTCTTTGGGGC | −168 to 441 |
04 | 5′ -CTTCAAATTTGCCTCCTTGC | 5′ -AGACCAAGCAGAGGAAGACG | 23012 to 23436 |
05 | 5′ -AATCTTGCCTCAGAGCAACC | 5′ -GAAAAGCAATTCCTAGGGGG | 25450 to 26066 |
06 | 5′ -GGGCAACAGAGTGAGACTCC | 5′ -TTCTGGAACTCAGCTCCTCC | 29413 to 30016 |
07 | 5′ -GGAGACCTGACATCAAAGCC | 5′ -AACCCCATCTCCTTCATTCC | 35266 to 35608 |
08 | 5′ -TAAGGTGTGAGCACACTGGG | 5′ -CGATGAGTTCTGTTCTGAGCC | 37821 to 38404 |
09 | 5′ -ATGGTGATTGGTGACCTTGG | 5′ -GGAAACTAGGGGATCTTGGC | 52986 to 53544 |
10 | 5′ -GTCTTGCTCCTGCTTTCACC | 5′ -TACCCTTGCCATTTGATTCC | 55802 to 56428 |
11 | 5′ -CTGCTGAAGAGGAGGACTGG | 5′ -ATGTCCATTGGAGACAAGGC | 56244 to 56855 |
12 | 5′ -GATTGTGGTATCTGCAGGGG | 5′ -CAACAGCTGGAGAAAGGACC | 61345 to 61753 |
13 | 5′ -TGACACTTTCACAGTCAACCG | 5′ -CAGCAGGCACGTTTACTACG | 65333 to 65920 |
14 | 5′ -CAGTCACCCTCTTGTCCTGG | 5′ -GGGAATTAAAAGGGAGAGGG | 68456 to 69067 |
15 | 5′ -CCTGGGAATAAGATAATGGGC | 5′ -AAATGCTGGTGAGGATGTGG | 74699 to 75338 |
16 | 5′ -ACAGCAGCAATGCAAAAACC | 5′ -TCTATTGCTCCAGGTGATGG | 90867 to 91468 |
17 | 5′ -ATGCTCTTGCGACAGAGTCC | 5′ -AACAAAGCAGGTCCATGAGC | 91365 to 91942 |
18 | 5′ -TTGGCAAAAAGTCATCTCCC | 5′ -CTAATTGCTTTGGACTGGGG | 91756 to 92379 |
19 | 5′ -CCACCAGATGCACAAAATCC | 5′ -TTTGCTTGGTTTGATTTCCC | 92247 to 92850 |
20 | 5′ -GAAGGTTCATATGAGGGGGC | 5′ -ATGACTGCTTTCTTGGACCC | 92700 to 93290 |
21 | 5′ -TCTGACCAGGGTCCTATTCC | 5′ -CATGATTGCTTTCACAAGCG | 93200 to 93816 |
22 | 5′ -ATTGGATCCTCTTGCTTGGG | 5′ -TGTCCCTGATTCCTCTACCC | 93674 to 94323 |
23 | 5′ -ATGCAAAATGCTTCTCAGGC | 5′ -AAAAGCTTGTTCAAAATAAATGG | 116050 to 116647 |
24 | 5′ -TCTGTACCACTTCTTCCAGGG | 5′ -TTTATGCCAGTCCAACCTGC | 117559 to 118125 |
25 | 5′ -TATTTTTGGAAGGTGGGAGG | 5′ -CGAATCCTTTGATCCTGAGC | 118087 to 118699 |
26 | 5′ -TTGATGAGACCAAAAGCTGG | 5′ -AGAGCATGGAGCTTGTCTGC | 118318 to 118933 |
27 | 5′ -AAGCACTTTGCATTTGAGGG | 5′ -TGGAGATCTTCGAGCTTTACC | 120414 to 120947 |
28 | 5′ -GGACCCCAGTTTCTTCAGC | 5′ -AGTGGGAAGTGGAGAGGAGG | 121066 to 121510 |
29B | 5′ -GAATTTAATCTCTGATTTCTCTAC | 5′ -GAGTGAATGTGATACATTTCCC | 122740 to 122902 |
30B | 5′ -TAAAAATAGGTTAAAATAAAGTG | 5′ -TTTAAATGACTAATTACATACCA | 126453 to 126668 |
29A | 5′ -TCAGGGTTGGTTACTGGAGC | 5′ -ACACTACCATGGTCTTGGGG | 158245 to 158735 |
30A | 5′ -AGTCAGTGGGCCTGTTATGG | 5′ -GTCCCTAGCTCTTGTTCCCC | 158526 to 159105 |
31 | 5′ -TGGGCAGATAGGGATAGTGG | 5′ -TTTGTGCGTTTCTCAACAGC | 158833 to 159409 |
32 | 5′ -TTCCCACTTCTTCTTGGTGC | 5′ -TGGGCATTTAGGTTGACTCC | 159304 to 159934 |
33 | 5′ -TCATGCCACTACACTCCAGC | 5′ -CTGCCCATAACCAAACTTCC | 160633 to 161150 |
34 | 5′ -GGGTGACAGAGCAAGACTCC | 5′ -AAAAGGCTTGGGAATCAAGG | 161982 to 162549 |
35 | 5′ -AGATGTCCCAGATGCGTAGG | 5′ -GCTTTCATGCAGGTTTCTCC | 184818 to 185411 |
36 | 5′ -TATTTTCTGCAGCTGCTCCC | 5′ -CTTTCAACAATTGCATCCTCC | 185329 to 185941 |
37 | 5′ -GAGGGGCACATTCTTATCTCC | 5′ -TCATAGTGAAGGGGTCAGGC | 185733 to 186382 |
38 | 5′ -CACCACACAATAGGATCCCC | 5′ -GTCAATGGGAAAAGAATGCC | 186289 to 186832 |
39 | 5′ -CAATCCACAAATGATGCAGG | 5′ -AGTGCCAGGATTACAGGCAT | 186639 to 187259 |
To include all known functional F8 regions in the variation scan, 41 distinct amplicons of the structural locus were PCR amplified from genomic-DNA samples and resequenced directly. The 11 amplicons indicated in italics were directly sequenced to genotype the entire GAIT cohort for the 12 F8 variations that were polymorphic in white individuals.
Numbering for nucleotides corresponds to the hg17 reference sequence for F8.
Region, TU* . | Protein* . | GenBank* . | m-AFs, %† . | F8 variation databases‡ . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T . | W . | AA . | Ch . | SEA . | J . | MI . | SAA . | HAMSTeRS38,58 . | VDR53 . | dbSNP59 . | |||
Promoter | |||||||||||||
−825G>A | — | 49785830 | 0.9 | NP | 7.7 | NP | NP | NP | NP | NP | NF | F8-002289 | NF |
−824G>A | — | 49785831 | 0.9 | NP | 7.7 | NP | NP | NP | NP | NP | NF | F8-002290 | NF |
−493G>A | — | 49785832 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-002621 | rs4898404 |
−385 A4>A5 | — | 49785833 | 0.9 | NP | NP | NP | NP | NP | 25.0 | NP | NF | NF | NF |
−287T>C | — | 49785834 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 2 | |||||||||||||
25610G>A | — | 49785835 | 0.5 | NP | 4.5 | NP | NP | NP | NP | NP | NF | F8-028722 | NF |
Intron 3 | |||||||||||||
25865G>A | — | 49785836 | 0.5 | NP | 4.2 | NP | NP | NP | NP | NP | NF | F8-028977 | NF |
25885G>C | — | 49785837 | 0.5 | NP | 4.2 | NP | NP | NP | NP | NP | NF | F8-028997 | NF |
29567C>T | — | 49785838 | 0.5 | NP | NP | NP | NP | NP | NP | 12.5 | NF | NF | NF |
Intron 4 | |||||||||||||
29854T>C | — | 49785839 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 5 | |||||||||||||
35518C>G | — | 49785840 | 0.9 | NP | 8.0 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 6 | |||||||||||||
53034A>G§ | — | 49785841 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
Exon 7 | |||||||||||||
53206G>T§ | W255C | 49785842 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NA | NF | NF |
Intron 7 | |||||||||||||
55938C>A§ | — | 49785843 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NF | NF | NF |
56010G>A§ | — | 49785844 | 9.6 | 6.4 | 26.9 | 7.7 | NP | 12.5 | 12.5 | 25.0 | + | F8-059122 | rs7058826 |
Exon 8 | |||||||||||||
56113G>A§ | A343A | 49785845 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | + | F8-059225 | rs1800289 |
Intron 9 | |||||||||||||
61534T>C | — | 49785846 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-064646 | rs5986899 |
Exon 10 | |||||||||||||
61620G>A | R484H | 49785847 | 0.9 | NP | 8.3 | NP | NP | NP | NP | NP | + | NF | NF |
Intron 13 | |||||||||||||
75215G>A | — | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-077273 | rs5987069 | |
90948A>G | — | 49785848 | 0.5 | NP | NP | NP | NP | NP | NP | 12.5 | NF | NF | NF |
Exon 14 | |||||||||||||
91317A>G | R776G | 49785849 | 0.5 | NP | NP | 7.7 | NP | NP | NP | NP | NF | NF | rs2228152 |
92555C>T§ | H1188H | 49785850 | 0.5 | NP | 4.3 | NP | NP | NP | NP | NP | NF | F8-095667 | NF |
92714C>G§ | D1241E | 49785851 | 14.7 | 7.7 | 72.7 | 7.7 | NP | 12.5 | 12.5 | 25.0 | + | F8-095826 | rs1800291 |
92798A>C§ | S1269S | 49785852 | 7.7 | 6.3 | 3.8 | 7.7 | 20.0 | NP | 12.5 | 25.0 | + | F8-095910 | rs1800292 |
92927G>A | K1312K | 49785853 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NF | NF | NF |
93401C>G | V1470V | 49785854 | 0.5 | NP | 4.5 | NP | NP | NP | NP | NP | NF | NF | NF |
93434G>A | P1481P | 49785855 | 0.9 | NP | 9.1 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 15 | |||||||||||||
116434C>T | — | 49785856 | 0.5 | NP | NP | 7.7 | NP | NP | NP | NP | NF | NF | NF |
Intron 18 | |||||||||||||
118909T>A§ | — | 49785857 | 20.3 | 14.6 | 84.6 | 7.7 | 14.3 | NP | 42.9 | 37.5 | + | F8-122021 | rs4898352 |
Intron 19 | |||||||||||||
120776T>C§ | — | 49785858 | 24.5 | 17.3 | 75.0 | 7.7 | 16.7 | NP | 37.5 | 50.0 | + | F8-123888 | rs4074307 |
Intron 22 | |||||||||||||
158352C>T | — | 49785859 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
158368T>C | — | 49785860 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | F8-161500 | NF |
158635C>T | — | 49785861 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-161767 | rs5987054 |
158777A>G | — | 49785862 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
158820C>T | — | 49785863 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-161952 | rs5987053 |
159087G>A | — | 49785864 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 23 | |||||||||||||
159874G>A§ | — | 49785865 | 1.4 | 1.4 | 3.8 | NP | NP | NP | NP | NP | NF | F8-163006 | NF |
Intron 24 | |||||||||||||
162013G>T§ | — | 49785866 | 3.3 | 5.0 | NP | NP | NP | NP | NP | NP | NF | F8-165145 | NF |
Exon 25 | |||||||||||||
162161A>G§ | M2238V | 49785867 | 1.8 | NP | 15.4 | NP | NP | NP | NP | NP | + | F8-165293 | rs17051967 |
Intron 25 | |||||||||||||
162475T>C§ | — | 49785868 | 4.1 | 5.6 | NP | NP | NP | NP | NP | 12.5 | NF | F8-165607 | NF |
3′ UTR | |||||||||||||
185156C>T§ | — | 49785869 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-188288 | rs5986887 |
186341G>A§ | — | 49785870 | 0.5 | NP | NP | NP | NP | 12.5 | NP | NP | NF | NF | NF |
186506A>G§ | — | 49785871 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
186602C>T§ | — | 49785872 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
186799G>A§ | — | 49785873 | 24.8 | 17.4 | 76.9 | NP | 13.3 | NP | 50.0 | 50.0 | + | F8-189931 | rs1050705 |
3′ DNA | |||||||||||||
186987T>G§ | — | 49785874 | 0.9 | NP | 9.1 | NP | NP | NP | NP | NP | NF | F8-190119 | NF |
187064T>C§ | — | 49785875 | 1.8 | NP | 15.4 | NP | NP | NP | NP | NP | NF | F8-190196 | NF |
Region, TU* . | Protein* . | GenBank* . | m-AFs, %† . | F8 variation databases‡ . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T . | W . | AA . | Ch . | SEA . | J . | MI . | SAA . | HAMSTeRS38,58 . | VDR53 . | dbSNP59 . | |||
Promoter | |||||||||||||
−825G>A | — | 49785830 | 0.9 | NP | 7.7 | NP | NP | NP | NP | NP | NF | F8-002289 | NF |
−824G>A | — | 49785831 | 0.9 | NP | 7.7 | NP | NP | NP | NP | NP | NF | F8-002290 | NF |
−493G>A | — | 49785832 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-002621 | rs4898404 |
−385 A4>A5 | — | 49785833 | 0.9 | NP | NP | NP | NP | NP | 25.0 | NP | NF | NF | NF |
−287T>C | — | 49785834 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 2 | |||||||||||||
25610G>A | — | 49785835 | 0.5 | NP | 4.5 | NP | NP | NP | NP | NP | NF | F8-028722 | NF |
Intron 3 | |||||||||||||
25865G>A | — | 49785836 | 0.5 | NP | 4.2 | NP | NP | NP | NP | NP | NF | F8-028977 | NF |
25885G>C | — | 49785837 | 0.5 | NP | 4.2 | NP | NP | NP | NP | NP | NF | F8-028997 | NF |
29567C>T | — | 49785838 | 0.5 | NP | NP | NP | NP | NP | NP | 12.5 | NF | NF | NF |
Intron 4 | |||||||||||||
29854T>C | — | 49785839 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 5 | |||||||||||||
35518C>G | — | 49785840 | 0.9 | NP | 8.0 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 6 | |||||||||||||
53034A>G§ | — | 49785841 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
Exon 7 | |||||||||||||
53206G>T§ | W255C | 49785842 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NA | NF | NF |
Intron 7 | |||||||||||||
55938C>A§ | — | 49785843 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NF | NF | NF |
56010G>A§ | — | 49785844 | 9.6 | 6.4 | 26.9 | 7.7 | NP | 12.5 | 12.5 | 25.0 | + | F8-059122 | rs7058826 |
Exon 8 | |||||||||||||
56113G>A§ | A343A | 49785845 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | + | F8-059225 | rs1800289 |
Intron 9 | |||||||||||||
61534T>C | — | 49785846 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-064646 | rs5986899 |
Exon 10 | |||||||||||||
61620G>A | R484H | 49785847 | 0.9 | NP | 8.3 | NP | NP | NP | NP | NP | + | NF | NF |
Intron 13 | |||||||||||||
75215G>A | — | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-077273 | rs5987069 | |
90948A>G | — | 49785848 | 0.5 | NP | NP | NP | NP | NP | NP | 12.5 | NF | NF | NF |
Exon 14 | |||||||||||||
91317A>G | R776G | 49785849 | 0.5 | NP | NP | 7.7 | NP | NP | NP | NP | NF | NF | rs2228152 |
92555C>T§ | H1188H | 49785850 | 0.5 | NP | 4.3 | NP | NP | NP | NP | NP | NF | F8-095667 | NF |
92714C>G§ | D1241E | 49785851 | 14.7 | 7.7 | 72.7 | 7.7 | NP | 12.5 | 12.5 | 25.0 | + | F8-095826 | rs1800291 |
92798A>C§ | S1269S | 49785852 | 7.7 | 6.3 | 3.8 | 7.7 | 20.0 | NP | 12.5 | 25.0 | + | F8-095910 | rs1800292 |
92927G>A | K1312K | 49785853 | 0.5 | NP | NP | NP | 6.7 | NP | NP | NP | NF | NF | NF |
93401C>G | V1470V | 49785854 | 0.5 | NP | 4.5 | NP | NP | NP | NP | NP | NF | NF | NF |
93434G>A | P1481P | 49785855 | 0.9 | NP | 9.1 | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 15 | |||||||||||||
116434C>T | — | 49785856 | 0.5 | NP | NP | 7.7 | NP | NP | NP | NP | NF | NF | NF |
Intron 18 | |||||||||||||
118909T>A§ | — | 49785857 | 20.3 | 14.6 | 84.6 | 7.7 | 14.3 | NP | 42.9 | 37.5 | + | F8-122021 | rs4898352 |
Intron 19 | |||||||||||||
120776T>C§ | — | 49785858 | 24.5 | 17.3 | 75.0 | 7.7 | 16.7 | NP | 37.5 | 50.0 | + | F8-123888 | rs4074307 |
Intron 22 | |||||||||||||
158352C>T | — | 49785859 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
158368T>C | — | 49785860 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | F8-161500 | NF |
158635C>T | — | 49785861 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-161767 | rs5987054 |
158777A>G | — | 49785862 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
158820C>T | — | 49785863 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-161952 | rs5987053 |
159087G>A | — | 49785864 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
Intron 23 | |||||||||||||
159874G>A§ | — | 49785865 | 1.4 | 1.4 | 3.8 | NP | NP | NP | NP | NP | NF | F8-163006 | NF |
Intron 24 | |||||||||||||
162013G>T§ | — | 49785866 | 3.3 | 5.0 | NP | NP | NP | NP | NP | NP | NF | F8-165145 | NF |
Exon 25 | |||||||||||||
162161A>G§ | M2238V | 49785867 | 1.8 | NP | 15.4 | NP | NP | NP | NP | NP | + | F8-165293 | rs17051967 |
Intron 25 | |||||||||||||
162475T>C§ | — | 49785868 | 4.1 | 5.6 | NP | NP | NP | NP | NP | 12.5 | NF | F8-165607 | NF |
3′ UTR | |||||||||||||
185156C>T§ | — | 49785869 | 0.5 | NP | 3.8 | NP | NP | NP | NP | NP | NF | F8-188288 | rs5986887 |
186341G>A§ | — | 49785870 | 0.5 | NP | NP | NP | NP | 12.5 | NP | NP | NF | NF | NF |
186506A>G§ | — | 49785871 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
186602C>T§ | — | 49785872 | 0.5 | 0.7 | NP | NP | NP | NP | NP | NP | NF | NF | NF |
186799G>A§ | — | 49785873 | 24.8 | 17.4 | 76.9 | NP | 13.3 | NP | 50.0 | 50.0 | + | F8-189931 | rs1050705 |
3′ DNA | |||||||||||||
186987T>G§ | — | 49785874 | 0.9 | NP | 9.1 | NP | NP | NP | NP | NP | NF | F8-190119 | NF |
187064T>C§ | — | 49785875 | 1.8 | NP | 15.4 | NP | NP | NP | NP | NP | NF | F8-190196 | NF |
Except for the mild hemophilic missense mutation W255C, the coding region variants in italics represent ns-SNPs that encode the amino-acid substitutions listed.
T indicates total; W, the 68 SW and 18 WA subjects, which were considered together; HAMSTeRS, Hemophilia A Mutation, Structure, Test, and Resource Site; VDR, Variation Discovery Resource (UW-FHCRC); dbSNP (NCBI); —, polymorphisms located outside of the F8 coding sequence; NP, not polymorphic; NF, not found; NA, not applicable; and +, a previously known polymorphism.
Polymorphisms are designated by genic region; nt alleles and position in the TU (start site and adjacent 5′ base are designated as nt 1 and −1, respectively); amino acid alleles and position in the mature plasma protein; and GenBank number for dbSNP submission.
Estimated from subjects in either the total VDG or each of its racial groups separately. Genotypic data were not complete for all variants and resulted in denominators that varied from the maximum number of distinct X-chromosomes (see “Patients, materials, and methods”).
Public databases with F8 polymorphisms.
Twenty-one SNPs contained in the 9 amplicons generated to genotype GAIT subjects for the subset of 12 that were located in a functional-gene region and variable in at least white individuals.
F8 variation scan
To identify potential FVIII determinants, we scanned the known functional regions of F8 (including all exons, 50 to 100 bp of each junctional intronic region, approximately 1.2 kb of contiguous promoter sequence, and approximately 300 bp of flanking 3′ genomic DNA) by directly sequencing 500- to 600-bp amplicons that, where necessary to cover extended regions, were overlapping (Table 1). Table S1 (available on the Blood website; see the Supplemental Materials link at the top of the online article) lists the lengths and hg17 nt boundaries of these amplicons and amino acids encoded by the exonic regions examined. Based on the July 2003 hg assembly, we initially generated 39 amplicons. Due to the presence of gaps, amplicons 29 and 30 (designated 29A and 30A) lacked exons 21 and 22 (Figure 1). Although absent from this scan, we included amplicons 29B and 30B, containing exons 21 and 22, respectively, in the genotyping phase described below under “Genotyping.” All amplicons, which were generated using genomic DNAs from VDG subjects, and their hg17 nt positions are listed in Table 1.
Agencourt Biosciences performed all initial amplification and cycle-sequencing reactions for the variation scan, in which each amplicon was examined on both strands. Water-negative controls for each amplicon were included and evaluated identically. Agencourt used ABI-3700 sequencers and the PHRAP programs PHRED, CONSED, and POLY-PHRED to identify variants.54,55 The quality of Agencourt's sequence chromatograms, which we used to both detect and genotype polymorphisms, were assessed in-house by (i) uploading chromatograms, including those for a blind genomic-DNA replicate from an Emory subject, into the Finch server to determine average PHRED quality (Q) scores; (ii) manually reviewing SeqMan alignments to validate base calls for minor alleles and strand consistency; (iii) performing agarose-gel electrophoresis and multiple-sequence alignments to validate negative controls for amplification and sequencing reactions, respectively; and (iv) manually reviewing multiple-sequence alignments of an approximately 5% random subset of all chromatograms. Sequences that were either of poor quality (average Q < 30)54,55 or that yielded inconsistent base calls were repeated in-house. Previously unknown polymorphisms were designated as naturally occurring if minor alleles were present in 2 or more subjects. Polymorphisms with minor alleles found in only 1 subject were considered naturally occurring if confirmed upon sequencing a second amplicon derived from an independent PCR.
LD
To evaluate pairwise allelic associations across F8, we calculated r2 and D′, 2 commonly used measures of linkage disequilibrium (LD), for the subset of 12 functional-region SNPs that were variable among the unrelated white individuals in the VDG and those GAIT subjects whose parents were not enrolled for study (Table 2). The results were plotted with SOLAR (http://www.sfbr.org/solar/), a software package for genetic variance-components analysis (Figure 2).60 D′ and r2 were calculated for all SNP pairs.61
Genotyping
We genotyped remaining GAIT subjects (158 female, 172 male) for the subset of functional-region polymorphisms that were variable in white individuals of the VDG (Table 2) by generating and directly sequencing 1 strand of the 9 amplicons indicated in Table 1. Agencourt performed the initial PCR and sequencing, for which we performed quality controlin-house. To ensure that any polymorphisms identified in exons 21 and 22 could be evaluated by association analysis, 2 additional amplicons, 29B and 30B (Table 1), were generated from every GAIT subject and sequenced on both strands. We made all final genotype calls using PHRED output and a custom-written SAS_8.2 program (http://www.SAS.com) that generates a separate chromatogram for each polymorphism (custom-written program available upon request). By being compact and centering the variant between 2 10-bp flanks, these chromatograms standardized our manual genotype review. We paired forward- and reverse-strand sequences from the same individual to perform manual reviews but did not reveal the identity further than providing the file names, which contained logistic data such as plate, well address, amplicon, and direction. To resolve missing genotypes, we sequenced amplicons generated in-house on an ABI-3100. Genotypic data were analyzed with INFER, a component program implemented in the PESYS suite (http://www.sfbr.org/pages/genetics_projects.php?p=42), for violations of Mendelian inheritance; all disparities were reviewed and either corrected or excluded from further analysis.
Covariate selection and genotype association analyses
Measured variables available in GAIT include age, sex, smoking status, ABO genotype, OC status, BMI, DM status, and plasma levels of TC, high-density lipoprotein (HDL), LDL, very-low-density lipoprotein (VLDL), TG, lipoprotein-a, fibrinogen, VWF antigen (VWF:Ag), and FIX:C.31,35 We used indicator variables to represent ABO genotypes, with OO as the reference level, and modeled separate mean FVIII:C levels for each.35 In addition, smoking and OC status were represented with dichotomous variables that indicated any use versus the reference level of no use. We selected as covariates all measured variables associated with FVIII:C levels, since these factors could confound the relationship of interest if by chance their data were distributed differentially among the genotypes under evaluation. Since GAIT is a family-based study, we used likelihood-based variance-components analysis, as implemented in SOLAR, to account for the nonindependence between study subjects.60 Further, we performed bivariate analyses with the endogenous factors and FVIII:C levels to investigate potential common genetic sources of variance.62
Since all F8 polymorphisms were biallelic, containing a major (M) and minor (m) allele, with 3 genotypes in females (M/M, M/m, m/m) and 2 in males (M/Y, m/Y) (Table 2), we combined GAIT subjects by genotype into 3 groups (M/M and M/Y; M/m; m/m and m/Y) and used a dosage-compensation model in which hemizygous males had a gene effect equivalent to females homozygous for the same allele. We represented each group with the value of 1, 0, or −1, respectively, using an additive model in which the mean for heterozygous females is halfway between the means for females homozygous for the M and m alleles, and the means for male hemizygotes and females homozygous for the corresponding allele are equivalent. To evaluate the relationship between F8 and FVIII:C level, we performed marginal measured-genotype association analysis, as implemented in SOLAR.63 In the first analysis, we tested each polymorphism separately for genotype-specific differences in mean FVIII:C levels using an initial complex model that incorporated as covariates age, age2, sex, age × sex, age2 × sex, ABO genotype, and smoking and OC status. We reanalyzed all polymorphisms found to be suggestively associated with FVIII:C level(P ≤ .10) using a more complex model that included as covariates all potentially confounding measured variables. Finally, we performed additional marginal measured-genotype analyses in which paired endogenous factors that shared common genetic influences with FVIII:C level were excluded singly and jointly. This strategy allows for potential individual actions without enforcing the same mechanistic effect on every polymorphism. For these analyses, we used linear splines with knots at 15 and 50 years (ie, approximately corresponding to puberty and menopause) to represent age,64–66 allowing a more flexible control of confounding than either a linear or a quadratic representation.
Results
Multiple unknown F8 polymorphisms identified
To determine whether F8 is a determinant of FVIII variability, we first identified candidate functional polymorphisms in 222 potentially distinct alleles contributed by the 137 unrelated VDG subjects. We examined 41 amplicons containing 1195 bp of promoter, 7054 bp of the 7056 bp of total coding sequence, all 1953 bp of untranslated exonic sequence, 4551 bp out of the approximately 5000 bp of total junctional-intronic sequence, approximately 2.3% of all deep intronic sequences (172 921 bp total), and 309 bp of 3′-flanking genomic DNA (Figure 1).46,57 These amplicons contained 19 157 bp and included every known functional region except 2 bp of exon 22, the 5′ splice junction (SJ) of intron 22, and 78% to 90% of the SJ sequences from the 3′ ends of introns 20 and 21 and the 5′ end of intron 21 (Table 1). By analyzing the approximately 188 kb of genomic DNA between the forward andreverse primers of the first and last amplicons, respectively, with RepeatMasker (http://repeatmasker.org), we found that approximately 31% of F8 is composed of nonrepetitive sequence, of which approximately 31% was covered in our scan.
We identified 47 variable sites (Figure 1) including 45 SNPs and a 1-bp insertion/deletion polymorphism (INDEL). The m allele of 53206G>T, which was carried by a female SEA subject and found in only 1 of the 222 X-chromosomes examined, encodes W255C, a missense mutation previously identified in Ch patients with mild hemophilia A.38 These variants, of which 18 were unknown (SeattleSNPs53 ; The Single Nucleotide Polymorphism Database of Nucleotide Sequence Variation59 ; The Haemophilia A Mutation, Structure, Test and Resource Site58 ; Table 2), were located in all functional genic regions. Four coding sequence polymorphisms were nonsynonymous SNPs (ns-SNPs) encoding the amino-acid substitutions R484H, R776G, D1241E, and M2238V (Figure 1). Although when estimated in the combined VDG (Table 2) m-allele frequencies (m-AFs) ranged from 0.5% to 24.8%, most F8 polymorphisms were singletons with infrequent and/or racially restricted minor alleles.
m-AFs vary with race
As the first step to investigate whether heritable FVIII determinants may vary across populations, a small number of nonwhite subjects from 6 racial groups were also studied. Specifically, we resequenced the same F8 regions in 51 individuals (31 female, 20 male) including 16 AA, 10 Ch, 10 SEA, 5 J, 5 MI, and 5 SAA subjects. While the m-AFs of a few polymorphisms (eg, 92798A>C) were similar across different races, most varied substantially. Indeed the m-alleles of 38 polymorphisms were found in only 1 racial group. Only 16 of the 46 polymorphisms were variable in either WA or SW subjects, or both, despite examining 144 white X-chromosomes, approximately 6 times more than the number from African Americans, the second most abundantly sampled racial group studied. Moreover the m-alleles for 30 of the 38 potentially racially restricted polymorphisms were found in a nonwhite racial group. Twenty-two of these nonwhite racially restricted polymorphisms were variable only in individuals of African descent; of these, the m-AF was greater than 5% for 8 and greater than 10% for 2 (162161A>G, 187064T>C). Furthermore, 2 of the 4 ns-SNPs, 61620G>A (R484H) and 162161A>G (M2238V), were AA restricted. Because the m-allele of 91317A>G (R776G) was found in a single Ch individual, 92714C>G (D1241E) is the only white ns-SNP. Although 8 SNPs were found in more than 1 racial group, their m-AFs varied widely. For example, the G-allele of 92714C>G was less frequent in the combined VDG and among individuals from each race separately, except those of African descent; in AA subjects it was actually the M-allele with a frequency of approximately 73% (Table 2). A similar pattern of differences in m-AF was observed for 3 additional SNPs (118909T>A, 120776T>C, and 186799A>G).
Low LD across F8
To survey LD across F8 we evaluated the 12 functional-region SNPs that were variable in (at least) the white subset of VDG subjects and/or the GAIT members whose parents were not enrolled in GAIT because the nonwhite populations were insufficiently sampled (n ≤ 16) to accurately estimate race-specific AF. We found a low overall degree of LD, whether using r2 (Figure 2) or D′ (not shown), with 6 sites being in linkage equilibrium with all other variants. Nevertheless, 120776T>C and 186799G>A exhibited high LD. The alleles of 92714C>G, which encodes the only white ns-SNP (D1241E), revealed moderate LD (0.4 ≤ r2 < 0.5) with 118909A>T, 120776C>T, and 186799A>G and high LD (r2 = 0.83) with 56010G>A (Figure 2).
FVIII:C level is broadly variable in GAIT
Genomic DNA from 394 GAIT subjects (213 female, 181 male) with duplicate FVIII:C measures were available for this study. Data from 2 females, with FVIII:C levels greater than 4 standard deviations (SDs) beyond the mean, were excluded from further analysis. FVIII:C levels in the remaining 392 subjects exhibited a mean (± SD) of 150.7 IU dL−1 (± 52.2 IU dL−1) and a greater than 6-fold concentration range (47-338 IU dL−1).
F8 is a modest determinant of FVIII variance
Before investigating the influence of F8 on FVIII, we attempted to identify variables for which it is necessary to control to avoid a potential bias. Sex, age, smoking status, ABO genotype, DM status, BMI, OC status, and plasma levels of fibrinogen, VWF:Ag, FIX:C, TC, LDL, VLDL, and TG were the measured variables available for this analysis. Although plasma C-reactive protein (CRP) concentration, the most commonly used measure of inflammation, was not assayed in GAIT, fibrinogen level may be a suitable surrogate marker.67–69 As GAIT subjects are SW, we investigated the relationship between FVIII:C levels and the subset of white F8 polymorphisms. Although the m-allele for 16 of the 45 SNPs were found in at least 1 white VDG subject (Table 2), 4 (including 158352C>T, 158368T>C, 158777A>G, and 159087G>A) were located more than 500-bp upstream from the 3′ SJ of intron 22, a region not known to be essential for F8 function in vivo (Figure 1). Because LD across F8 was irregular and low overall (Figure 2), we genotyped all 12 functional-region white SNPs in GAIT and evaluated their influence on FVIII:C levels separately using marginal measured-genotype association analysis. The results from our initial analyses, in which we modeled as covariates only age, age2, sex, age × sex, age2 × sex, ABO genotype, and smoking and OC status, demonstrated significantly different genotype-specific mean FVIII:C levels (P < .05) for both 56010G>A, a noncoding SNP located 27 nucleotides upstream from the 3′ SJ of intron 7, and 92714C>G, the only white ns-SNP, which is located in exon 14 and substitutes glutamate for aspartate at residue 1241 in the B domain (Figure 1). Although both SNPs represent potential FVIII QTNs, we are unable to decipher which is the true functional variant from this study because their alleles were strongly associated within GAIT founders (Figure 2).
To explore further the possibility that either 92714C>G or 56010G>A differentially influences FVIII:C level, we performed additional marginal measured-genotype association analyses using more complex models. We chose to present only the results for 92714C>G because in addition to being in high LD with 56010G>A, it encodes the only nonhemophilic amino-acid substitution (D1241E) found in white individuals and had more complete data available (ie, 6 subjects had missing genotypes for 56010G>A). In the final analysis of D1241E (92714C>G) we incorporated as covariates all measured variables associated with FVIII:C level to control for confounding. In a separate bivariate analysis of FVIII:C with both FIX:C and VWF:Ag, we found potential overlap in the sets of (unknown) genes that may affect their trait levels. Thus we present the 4 analyses in Table 3. We excluded FIX:C (model 3), VWF:Ag (model 2), and the pair jointly (model 1), such that they were nested within model 4. Results from model 1 demonstrated a significant association between D1241E and FVIII in which each D-allele additively increased the mean FVIII:C level by 14.3 IU dL−1 (P = .016). Since data were not available for FIX:C and VWF:Ag levels from all GAIT subjects, including them as covariates resulted in smaller sample sizes for the analyses using models 2-4. However, the results of these analyses, when restricted to the 306 subjects with complete data, showed similar trends, suggesting that the effects of adding FIX:C and/or VWF:Ag were not limited to the loss of subjects excluded due to missing data. For instance, when the 306 subjects in model 4 were reanalyzed using model 1, each D-allele was estimated to additively increase the mean FVIII:C level by 11.8 IU dL−1 (P = .042).
Covariate* . | Model 1 . | Model 2 . | Model 3 . | Model 4 . | ||||
---|---|---|---|---|---|---|---|---|
β† . | P‡ . | β† . | P‡ . | β† . | P‡ . | β† . | P‡ . | |
Age, y | −4.12 | .089 | −3.43 | .178 | −3.75 | .065 | −3.33 | .090 |
Sex | 4.43 | .364 | 6.45 | .187 | 13.72 | .001 | 13.41 | .001 |
Age × sex | −1.33 | .729 | −4.93 | .250 | −3.37 | .322 | −3.73 | .257 |
A15§ | 3.62 | .152 | 3.00 | .258 | 3.44 | .102 | 3.03 | .136 |
A15 × sex | 2.12 | .596 | 5.62 | .206 | 3.88 | .272 | 4.14 | .225 |
A50§ | 3.48 | < .001 | 3.13 | < .001 | 1.44 | .013 | 1.58 | .005 |
A50 × sex | −2.78 | .004 | −2.30 | .015 | −1.48 | .058 | −1.42 | .059 |
OC status‖ | −23.82 | .056 | −23.80 | .080 | −11.22 | .298 | −15.10 | .148 |
Smoking status‖ | −11.45 | .021 | −13.48 | .008 | −5.84 | .153 | −8.27 | .038 |
AA¶ | 25.85 | .002 | 27.61 | .001 | 10.36 | .135 | 11.47 | .084 |
AB¶ | 26.03 | .109 | 35.41 | .047 | 11.23 | .444 | 9.91 | .482 |
AO¶ | 21.80 | < .001 | 15.78 | .006 | 3.75 | .447 | 2.71 | .568 |
BO¶ | 26.91 | .002 | 29.53 | .001 | 15.56 | .032 | 16.97 | .015 |
DM status‖ | −6.11 | .618 | −25.36 | .101 | −4.01 | .746 | −9.45 | .470 |
Fibrinogen level, IU dL−1 | 13.56 | < .001 | 9.17 | .012 | 2.55 | .400 | 0.37 | .901 |
BMI, kg m−2 | 0.22 | .718 | −0.46 | .471 | 0.85 | .085 | −0.05 | .926 |
VLDL level, mM | 45.58 | < .001 | 22.49 | .082 | 21.12 | .039 | 8.60 | .398 |
FIX:C level, IU dL−1 | N/A | N/A | 0.56 | < .001 | N/A | N/A | 0.45 | < .001 |
VWF:Ag level, IU dL−1 | N/A | N/A | N/A | N/A | 0.73 | < .001 | 0.66 | < .001 |
D1241E# | 14.33 | .016 | 11.99 | .049 | 7.26 | .142 | 6.10 | .199 |
Covariate* . | Model 1 . | Model 2 . | Model 3 . | Model 4 . | ||||
---|---|---|---|---|---|---|---|---|
β† . | P‡ . | β† . | P‡ . | β† . | P‡ . | β† . | P‡ . | |
Age, y | −4.12 | .089 | −3.43 | .178 | −3.75 | .065 | −3.33 | .090 |
Sex | 4.43 | .364 | 6.45 | .187 | 13.72 | .001 | 13.41 | .001 |
Age × sex | −1.33 | .729 | −4.93 | .250 | −3.37 | .322 | −3.73 | .257 |
A15§ | 3.62 | .152 | 3.00 | .258 | 3.44 | .102 | 3.03 | .136 |
A15 × sex | 2.12 | .596 | 5.62 | .206 | 3.88 | .272 | 4.14 | .225 |
A50§ | 3.48 | < .001 | 3.13 | < .001 | 1.44 | .013 | 1.58 | .005 |
A50 × sex | −2.78 | .004 | −2.30 | .015 | −1.48 | .058 | −1.42 | .059 |
OC status‖ | −23.82 | .056 | −23.80 | .080 | −11.22 | .298 | −15.10 | .148 |
Smoking status‖ | −11.45 | .021 | −13.48 | .008 | −5.84 | .153 | −8.27 | .038 |
AA¶ | 25.85 | .002 | 27.61 | .001 | 10.36 | .135 | 11.47 | .084 |
AB¶ | 26.03 | .109 | 35.41 | .047 | 11.23 | .444 | 9.91 | .482 |
AO¶ | 21.80 | < .001 | 15.78 | .006 | 3.75 | .447 | 2.71 | .568 |
BO¶ | 26.91 | .002 | 29.53 | .001 | 15.56 | .032 | 16.97 | .015 |
DM status‖ | −6.11 | .618 | −25.36 | .101 | −4.01 | .746 | −9.45 | .470 |
Fibrinogen level, IU dL−1 | 13.56 | < .001 | 9.17 | .012 | 2.55 | .400 | 0.37 | .901 |
BMI, kg m−2 | 0.22 | .718 | −0.46 | .471 | 0.85 | .085 | −0.05 | .926 |
VLDL level, mM | 45.58 | < .001 | 22.49 | .082 | 21.12 | .039 | 8.60 | .398 |
FIX:C level, IU dL−1 | N/A | N/A | 0.56 | < .001 | N/A | N/A | 0.45 | < .001 |
VWF:Ag level, IU dL−1 | N/A | N/A | N/A | N/A | 0.73 | < .001 | 0.66 | < .001 |
D1241E# | 14.33 | .016 | 11.99 | .049 | 7.26 | .142 | 6.10 | .199 |
For model 1, n = 361; for model 2, n = 313; for model 3, n = 307; for model 4, n = 306.
Endogenous, environmental, and genetic variables included in the marginal measured-genotype association analysis of the relationship between D1241E and FVIII:C levels in GAIT.
The estimated effect (β) of a given covariate or each D-allele of the FVIII D1241E polymorphism (F8 92714C>G) on FVIII:C levels.
The P value for each β.
The coefficient for splines with knots at 15 and 50 years.
Status (yes/no) indicates either any use (smoking and oral contraception) or the presence of diagnostic criteria for DM.
The 4 ABO genotypes, other than OO (reference genotype), found in GAIT subjects.
D1241E genotypes were grouped and represented with indicator variables as follows: 1 (D/D and D/Y); 0 (D/d); and −1 (d/d and d/Y).
As expected, TG and TC levels were nearly colinear with VLDL and LDL levels, respectively, in GAIT (not shown). Furthermore, the Pearson correlation coefficients for these paired physiologically related lipid parameters were high whether we analyzed all GAIT subjects (and did not account for nonindependence between observations) or only the (independent) founders (ρ ≥ 0.95; P < .001). Finally, the “effect” of D1241E on FVIII:C level was found to be similar whether we substituted TG levels for VLDL levels or TC levels for LDL levels as covariates in our analyses (not shown). Since removing both LDL levels and TC levels had a negligible impact upon the estimated effect of D1241E on FVIII (14.0 IU dL−1; P = .019), these variables do not appear to confound the relationship under study70 and therefore were not included in our final analysis (Table 3). In contrast, removal of both VLDL and TG levels resulted in a noticeable change (not shown). While the choice between using TG or VLDL levels was arbitrary and the results were comparable, we present the findings obtained with VLDL level as a covariate (Table 3), since fewer subjects had missing data for this measurement.
Discussion
We performed this study to identify genetic determinants contributing to the broad normal range for FVIII:C level,5,6 a quantitative trait that influences thrombosis risk,35 both venous1–3 and arterial.2,4 Despite involving a substantial heritable component,31 functional allelic variants affecting the interindividual variability in this trait have been identified in only 1 gene.22 While F8 represents an obvious candidate, the negative findings from previous studies using linkage analysis36,37,39,71 have led some investigators to conclude that the encoding structural locus is not a FVIII QTL.71 Linkage studies are in general, however, less powerful than investigations based on association analysis, which usually directly examine potential functional variants, or polymorphisms in high LD with nongenotyped functional sites, especially for X-linked loci like F8 that presently can be evaluated only by single-marker linkage analysis. Although F8 has been the focus of numerous candidate gene studies, no polymorphisms were found within the functional regions examined in these investigations.25,41–43,45 Nevertheless, F8 should not be excluded as a possible FVIII QTL because less than 25% of all known functional regions, and therefore less than 2.5% of the entire structural locus, were scanned for variants in these studies. Despite the recent discovery of approximately 100 F8 polymorphisms by SeattleSNPs53 in a high-throughput resequencing study that examined 71 X-chromosomes from 47 unrelated individuals representing 2 racial groups,53 these potential FVIII determinants could not be evaluated since no phenotypic data were available.
As our first step to investigate F8, we resequenced all known functional regions in a collection of 222 potentially distinct alleles from 137 unrelated nonhemophilic individuals representing 7 racial groups. We identified 47 variants distributed throughout F8, including 45 SNPs, a 1-bp INDEL polymorphism, and the mild hemophilic missense mutation W225C, found in a single asymptomatic heterozygous female (Table 2). Despite their location in 1 of the most extensively investigated human loci, 18 variants were previously unknown. Because our study, together with the SeattleSNPs scan,53 identified 119 different polymorphisms (Table 2), F8 may be as variable as other human genes. Furthermore, because the 4 SNPs 61620G>A, 91317A>G, 92714C>G, and 162161A>G encode the nonsynonymous substitutions R484H, R776G, D1241E, and M2238V, respectively (Figure 1), wild-type FVIII is a variable protein in nonhemophilic populations and not monomorphic as long thought.6 Although SeattleSNPs identified 98 polymorphisms,53 only 27 were located in the regions examined in this study; the larger number of total variants found is likely due to the fact that approximately 40 kb of deep intronic sequence was investigated.
The distribution of FVIII:C levels in GAIT displayed the typical broad variability observed for this trait.5,6,41 Because the degree of LD across F8 was found to be weak overall (Figure 2), we separately evaluated the 12 functional-region SNPs variable in white individuals (Table 2) for potential contributions to FVIII variability by measured-genotype association analysis. Preliminary analyses (not shown), in which we accounted for the potential confounders age, sex, ABO genotype, and smoking and OC status as covariates, demonstrated that FVIII:C level was significantly associated (P = .007) with only 92714C>G (D1241E).39 Scanavini et al40 confirmed this association in a case-control study of women with idiopathic thrombophilia. Specifically, they reported a higher mean FVIII:C level in female subjects with the C/C genotype (D/D phenotype) compared with those with either C/G or G/G genotypes (D/E or E/E phenotypes). Because no other F8 polymorphisms were (i) found in our preliminary analyses to be significantly associated with FVIII levels or the alleles of D1241E,40 or (ii) investigated by Scanavini et al,40 the structurally distinct proteins encoded by this ns-SNP are likely to be functionally distinct and contribute to FVIII:C variability. However, when reanalyzed only among GAIT founders (Figure 2), 92714C>G demonstrated a high LD (r2 = 0.84) with 56010G>A, an SNP that had incomplete genotypic data in the preliminary analysis.39 Since 56010G>A is located within a potentially functional segment of the 3′ SJ of intron 7 (ie, position −27) and was found to have a significant association (P < .05) with FVIII:C level when reanalyzed using complete data (not shown), we cannot, based on data from this study alone, unequivocally establish which of these variants, if either (see below), is the true QTN.
To evaluate further the relationship of these 2 SNPs with FVIII, we controlled for all potentially confounding variables available in GAIT. Support for an association of age with FVIII:C level is among the strongest.7–10,12,72 Results from our analyses of FVIII:C level versus age are consistent with this whether assessed alone (not shown) or with covariates (Table 3). In a large cross-sectional study with a narrower and younger age range than GAIT subjects, Green et al17 did not observe an association between age and FVIII levels. Because Miller et al73 also found no association in a study of women of similar age, we used splines at 15 and 50 years to allow a less rigorous imposition on the data across the broad age range in GAIT (2-87 years). This approach allows flexibility in the effects of age on FVIII:C levels in the young and the old instead of modeling, for instance, a quadratic effect across both, in which the age data for older subjects influences the estimated response for the young and vice versa. The relationship between sex and FVIII:C level remains uncertain, with some studies demonstrating significant associations7,9 and others not8,18 ; we did not find evidence of a genotype-by-sex interaction. The impact of OC status on FVIII:C level has not been established.8,10,74,75 Although we found an apparently large effect for OC, comparable to that observed for ABO genotype (Table 3), oral contraceptives were used by only a small number of female GAIT subjects. Two large cross-sectional studies found modestly lower FVIII:C levels in smokers,7,15 whereas other investigations observed no association with this variable.8,17 We found that smokers have a lower mean FVIII:C level but could not characterize this relationship further because detailed smoking information is lacking in GAIT. Since FVIII is an acute-phase reactant, its plasma concentration may be transiently affected by various conditions, including inflammation.69 Although CRP level was not measured, we incorporated the surrogate marker fibrinogen level as a covariate in our analyses (Table 3), similarly to Kreuz et al.68
There is strong support for an association of ABO blood type with both FVIII:C level17–21 and VWF:Ag level.24–26,28,29 Using linkage analysis, Souto et al22 demonstrated that the functional ABO polymorphisms responsible for the antigens of this blood group are likely to directly influence the levels of both hemostasis traits. Until recently,39,40 these polymorphisms were the only known QTNs for FVIII. Although not completely understood, the effect of ABO genotype or phenotype on these 2 proteins may involve the same mechanism. Indeed, in a bivariate analysis of FVIII:C and VWF:Ag levels, we found potential overlap in the sets of genes associated with variability in their plasma levels. While less studied, associations between FIX and FVIII have also been reported.24 Since neither FIX nor FVIII functions outside of intrinsic Xase, regulatory mechanisms might exist that coordinate their levels to prevent a potential cellular resource misallocation. Consistent with this, our findings from a bivariate analysis demonstrated a potential overlap in the set of genes that affect FIX and FVIII levels. Including factors that share pleiotropic influences with the trait of interest as covariates decreases the power to detect gene variants that influence both phenotypes jointly but may increase power to detect those with unique influences on the focal trait. If F8 variants are in part responsible for the observed pleiotropy between FVIII and FIX or VWF, then model 1, excluding variables that are genetically correlated with FVIII, provides the most power. In contrast, if unidentified loci contribute to their shared heritability, then model 4 is preferable because it incorporates the known correlates of FVIII that may confound analysis. Without knowing the genetic architecture underlying pleiotropy in these traits, we focus on the results from model 1—the most conservative from the perspective of identifying functional gene variants, the goal of the present study—which demonstrated an additive increase in mean FVIII:C level of 14.3 IU dL−1 per D-allele (Table 3). Scanavini et al40 recently reported similar findings in that subjects with the C/C genotype (D/D phenotype) had a 19 IU dL−1 higher unadjusted mean FVIII:C level than the group of subjects with either a C/G or G/G genotype (D/E or E/E phenotype).
Despite strong allelic association between 56010G>A and 92714C>G, we predict that 92714C>G represents the true QTN because it encodes structurally distinct wild-type FVIII molecules (D1241E) that represent the only 2 forms of the protein expressed by white individuals, the population sampled in GAIT. Furthermore, while 56010G>A is near the 3′ end of intron 7, it is not located in essential cis-elements of the 3′ SJ. However, genetic studies in other populations with different patterns of allelic associations and/or in vitro functional assays are necessary to exclude the possibility that 56010G>A, or a nongenotyped functional polymorphism also in high LD with 92714C>G, is the FVIII QTN. Since the B domain undergoes numerous posttranslational modifications that limit FVIII expression,38,76,77 and yet represents a portion of the protein known to be dispensable for its procoagulant activity once in the circulation,78–81 we hypothesize that 92714C>G (D1241E) modulates FVIII:C levels by differentially affecting the secretion of this molecule.82,83 Because D1241E could also influence FVIII clearance and/or activation, elucidating the molecular basis for its effect on FVIII:C level will require in vitro investigations since genetic studies alone cannot differentiate between these biologically plausible alternative mechanisms.
Regardless of whether 92714C>G is the actual QTN or in strong LD with it, we estimate that F8 accounts for approximately 10% of the total FVIII variability in GAIT (not shown). This is consistent with the lack of evidence for linkage of FVIII:C levels to X-linked microsatellites within or near F8, which were genotyped in prior genome-wide screens,36,37 including 1 performed in GAIT,39 as we would not expect to detect QTLs with effect sizes in this range. As both 92714C>G and 56010G>A were polymorphic in all racial groups studied except SEA (Table 2), this F8-based determinant may influence FVIII variability in these populations. Because GAIT is composed entirely of SW subjects, however, appropriately designed studies in other racial groups are necessary to determine whether our findings apply to nonwhite populations. Such studies are also required to define further the genetic architecture underlying human FVIII variability since 30 of the functional-region F8 polymorphisms identified, including the 3 additional ns-SNPs R484H, R776G, and M2238V, were not evaluated as potential FVIII determinants because they were not variable in white individuals (Table 2). In summary, it is important to emphasize that we resequenced less than 15% of F8 and not all GAIT founders were enrolled for study. Therefore, we did not examine entirely every potentially distinct F8 allele segregated in GAIT. Thus, the possibility that this locus contains additional FVIII determinants, either as common polymorphisms84 or rare variants,85 since both have been shown to contribute substantially to complex trait variability at the population level, cannot be excluded.
Authorship
Contribution: J.C.S., J.B., J.F., J.M.S., L.A., and T.E.H. contributed the study concept and design; J.C.S., J.F., and S.T.W. enrolled the study subjects; K.R.V., D.K.M., J.C.S., J.F., J.M.S., and T.E.H. acquired the data; K.R.V., D.K.M., D.M.W., A.B., K.F., J.P., T.S., J.B., S.P., W.D.F., L.A., and T.E.H. provided data analysis and interpretation; K.R.V., J.M.S., W.D.F., L.A., and T.E.H. drafted the manuscript; and K.R.V., D.K.M., M.K., W.D.F., L.A., and T.E.H. provided critical manuscript revision for important intellectual content.
Conflict-of-interest disclosure: We certify that we have no affiliation with or financial involvement in any organization or entity with a direct financial interest in the subject matter or materials discussed in the manuscript. All financial support for this research project is identified in “Acknowledgments.”
K.R.V. and D.K.M. contributed equally to this work.
Correspondence: Tom E. Howard, Southwest Foundation for Biomedical Research, Department of Genetics, Rm 12.306, Slick-Urschel Bldg, San Antonio, TX 78245-0549; e-mail: thoward@sfbrgenetics.org.
A portion of this work was presented initially at the 45th annual meeting of the American Society of Hematology, San Diego, CA, December 8, 2003.39
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
We are grateful to all individuals who participated in this study and the physicians who enrolled them. We would also like to thank Cynthia Channell, Ming Shen, and Thuy Tran for excellent technical assistance; Art Thompson, Earl Davie, and Bruce Evatt for helpful discussions and/or reviewing the manuscript; and both Alexander (Sandy) Duncan and Pete Lollar for early support and mentoring.
This work was supported in part by the following grants: NIH-MH-59490 (J.B.); NIH-HL-70751 (L.A.); NIH-HL-71130 and NIH-HL-68016 (T.E.H.), Fondo Investigación Sanitaria (FIS)–02/0375, SAF2002-03449 (FEDER, Spanish Ministry of Science and Technology), FIS-RECAVA C03/01 (Fundació “La Caixa” and Fundació d'Investigació Sant Pau). J.M.S. and A.B. were supported by FIS-99/3048 and FIS-01/A046 (Fondo Investigación Sanitaria; Spanish Ministry of Health).
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal