Key Points
We identified 38 significant loci associated with VTE through GWAS meta-analysis and prioritized causal genes through an integrative method.
Functional confirmatory studies of GWAS hits via genome editing in zebrafish showed a novel role for genes TC2N and RASIP1 in thrombosis.
Visual Abstract
Venous thromboembolisms (VTEs) are a leading cause of morbidity and mortality. Although many genetic risk factors have been identified, a substantial portion of the heritability remains unexplained. In this study, we employed a genome-wide association study (GWAS) for VTE across 9 international cohorts of the Global Biobank Meta-Analysis Initiative to address this question, along with in vivo functional validation. In this multipopulation GWAS (VTE cases, 27 987; controls, 1 035 290), 38 genome-wide significant loci were identified, 4 of which were potentially novel. For each autosomal locus, we performed gene prioritization using 7 independent, yet converging, lines of evidence. Through prioritization, we identified genes associated with VTE through GWAS and/or functional studies (eg, F5, F11, VWF, STAB2, PLCG2, TC2N), functionally validated those that did not have evidence other than GWAS (TC2N, TSPAN15), and discovered 1 not previously associated with coagulation (RASIP1). We evaluated the function of 6 prioritized genes with strong genetic evidence, including F7 as a positive control, using laser-mediated endothelial injury to induce thrombosis in zebrafish after CRISPR/Cas9 knockdown. From this assay, we have supportive evidence for the role of RASIP1 and TC2N in the modification of human VTE and suggestive evidence for STAB2 and TSPAN15. This study expands on the currently identified genomic architecture of VTE through biobank-based, multipopulation GWASs, in silico candidate gene predictions, and in vivo functional follow-up of candidate genes.
Introduction
Deep vein thrombosis and pulmonary embolism, collectively referred to as venous thromboembolism (VTE), are disorders characterized by the pathologic formation of thrombi in deep veins that risk embolization to the pulmonary circulation. VTE is a common cause of morbidity and mortality and affects >900 000 individuals per year in the United States.1-4 As a complex trait, VTE risk is influenced by an array of well-described environmental factors and genetics. Heritability studies have suggested that between 30% and 40% of VTE risk is a consequence of genetic factors.5 Before genotyping technology allowed a genome-wide assessment of common variants, a few polymorphisms were implicated in VTE risk, but only 2 were validated by initial European (EUR) population GWASs. Those were the ABO blood group6 and a common variant in the F5 gene (factor V Leiden).7,8 Previous large GWAS8-24 have identified common and rare variants of dozens of loci that are associated with the risk for VTE (supplemental Table 1). The interaction between common variation and rare pathogenic variants in genes such as TUBB1, PROC, and PROS1 has also been described.25
The strongest signals observed in GWASs for VTE are those associated with loci with known roles in the hemostatic system, such as gain of function variants in procoagulant genes (F2, F5, F11, FGG20) or missense variants p.Ser219Gly and p.Arg113Cys in the anticoagulant gene PROCR.26,27 As the list of VTE risk variants continues to grow with the inclusion of loci with no previously described genes involved in thrombosis, so does the need for functional analyses of these variants. Zebrafish serve as a vertebrate model with genes that are highly conserved in the human genome,28 including the coagulation cascade.29 Their high fecundity, optical transparency, and external development make them amenable to the functional analysis of top selected GWAS signals in an in vivo model.30-33 We have previously shown that we can evaluate normal and pathologic hemostasis, as well as thrombosis, in zebrafish embryos and larvae using genome editing.34-36
In this study, we expanded the list of loci associated with VTE through a multipopulation meta-analysis of GWASs from biobanks of the Global Biobank Meta-Analysis Initiative (GBMI). Using genetic associations that emerged from this meta-analysis at known and potentially novel loci, we performed integrative bioinformatics-driven gene prioritization and subsequent functional analyses of 7 candidate genes in zebrafish assessed for thrombosis.
Methods
GWAS meta-analysis
VTE is one of the pilot phenotypes of the GBMI. Each biobank conducted genotyping, imputation, quality controls, and completed the GWAS in accordance with the GBMI analysis plan.37 In brief, a logistic mixed model in Scalable and Accurate Implementation of a Generalized Mixed Model or REGENIE was used, and covariates included age, age2 sex, age × sex, first 20 principal components from genetic data, and any biobank-specific covariates. VTE was defined as in previous GWASs17 and refined to include cases with the International Classification of Diseases, 10th Revision codes I80.1, I80.2, I82.2, I26.0, and I26.9; the International Classification of Diseases, 9th Revision codes 451.11, 453.40, 453.2, 453.77, 453.87, and 415.1; and the Office of Population Censuses and Surveys-4 Procedure Codes L791 or L90.2 (supplemental Table 2). After phenotype harmonization, GWAS summary statistics across 9 international cohorts (BioMe, Vanderbilt University Biobank [BioVU], China Kadoorie Biobank, Estonian Biobank, FinnGen, Genes and Health Study, Michigan Genomics Initiative, University of California, Los Angeles, and United Kingdom Biobank) with representation across 5 super populations (American, African [AFR], East Asian, EUR, [including Finnish and non-Finnish EUR ancestries], and South Asian) were combined using an inverse variance–weighted meta-analysis with 27 987 cases, 1 035 290 controls, and an effective sample size of 107 409 (supplemental Tables 3-5; supplemental Figure 1). We defined genome-wide significant loci as described in Zhou et al37 by iteratively spanning the ±500 kb region around the most significant variant and merging overlapping regions until no genome-wide significant variants (P < 5 × 10–8) were detected within ±500 kb. The most significant variant in each locus was selected as the lead single nucleotide polymorphism (SNP). The genomic control factor (lambda) used to measure the inflation of test statistics was 1.042 (supplemental Figure 2), and therefore P values were not adjusted for inflation.
We looked up the novel loci in the 2 most recently published studies, namely the 2022 International Venous Thrombosis Network (INVENT) meta-analysis within the Million Veterans Program38 (supplemental Table 6) and a 2023 meta-analysis of individuals of EUR genetic ancestry.39 We do not consider this a formal replication because there is some sample overlap between these 2 studies and our meta-analysis.
Statistical analysis
Credible sets
To fine-map the loci identified in the VTE meta-analysis, we generated credible sets of causal variants using the sum of single effects,46 an iterative, Bayesian, step-wise selection using sparse multiple regression–determined credible sets with a 95% posterior probability of containing potential causal variants. A linkage disequilibrium (LD) reference panel from 2504 unique individuals from all ancestral cohorts of 1000 Genomes was used. To create credible sets, we considered regions ±500 kb from the index variant.
Gene prioritization
For each autosomal locus, we performed gene prioritization using 7 independent, yet converging, lines of evidence. We used Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) and polygenic priority score (PoPS) for gene prioritization for all 14 endpoints in the GBMI pilot study.37 Using the variants with a P value of <1 × 10–5 in the multipopulation meta-analysis, any gene with a false discovery rate of <0.05 with DEPICT47,48 was considered prioritized. Similarly, a gene in the top 10% of genes as ranked by PoPS49 was considered as the prioritized genes. For both analyses, individuals of EUR ancestry from the 1000 Genomes Project phase 3 were used as the LD reference panel,50 because 86% of the individuals included in the GWAS were of primarily EUR ancestry.
We compared the performance of DEPICT and PoPS using a gold standard set of coagulation and platelet genes (N = 41; supplemental Table 7), determined before the GWAS by a medical and molecular genetics expert in VTE and coagulation (J.A.S.) and based on a high-throughput sequencing panel containing a gold standard list of coagulation and platelet genes by the ThromboGenomics group.51 We used a significant false discovery rate threshold of <0.05 to define prioritized genes from the DEPICT result. Of the 54 genes prioritized by DEPICT, 11 of those were in the VTE gold standard gene list (area under the curve, 0.75; supplemental Figure 3). For the PoPS gene prioritization result, we selected the top 10% genes (N = 1839) with the highest PoP score as the prioritized genes, and 32 of the PoPS-defined prioritized genes were reported in the gold standard gene list (area under the curve, 0.84; supplemental Figure 3). We further evaluated the performance of DEPICT and PoPS in predicting functional genes using the DeLong test, which showed no significant difference between the accuracy of the 2 methods (P = .30). Therefore, we concluded that both methods can be used in the integrative prioritization.
For the proteome-wide Mendelian randomization52 and colocalization analysis, candidate SNPs from the non-Finnish EUR meta-analysis with a P value of <1 × 10–5 were selected for the genetic association with VTE. For the approximate colocalization, we aimed to test whether the leading protein quantitative trait locus was in LD (r2 ≥ 0.8) with a candidate SNP. Genes with proteome-wide Mendelian randomization and colocalization evidence were used for prioritization. We also looked up the lead SNPs in relevant Genotype-Tissue Expression tissues,53 including whole blood, atrial appendage, left ventricle, aortic artery, coronary artery, tibial artery, and Epstein-Barr virus-transformed lymphocytes, and reported significant expression quantitative trait loci (eQTLs; q value of < 0.05). We also considered deleterious mutations for gene prioritization. We identified genes with a pathogenic variant in ClinVar54 as of 7 May 2021 or genes with a nonsynonymous variant in the 95% credible set (calculated using the sum of single effects). For both the eQTL and ClinVar annotations, we also considered genome-wide significant variants within 50 kb upstream and downstream of the lead SNP, but this expanded variant set did not significantly impact the main prioritized gene (supplemental Table 8). Finally, we considered the nearest gene, as annotated with ANNOtate VARiation. For each lead SNP, a simple sum across these 7 lines of evidence was used to identify a potentially causal gene with the most evidence. We used Enrichr55-57 for the 38 prioritized candidates to identify Gene Ontology (GO) molecular function and biologic processes and Kyoto Encyclopedia of Genes and Genomes human pathways.
Targeted knockdown in zebrafish using genome engineering
We used the ChopChop server to identify highly efficient single-guide RNAs (sgRNAs) for CRISPR/Cas9–mediated genome editing.58 A total of 2 to 4 guides were selected for each gene of interest (supplemental Table 9). sgRNAs and Cas9 nuclease were ordered from Synthego. sgRNAs for each gene were pooled, mixed with Cas9, and injected into single-cell embryos produced from ABxTL hybrids. To prove that editing occurred, we validated each sgRNA by lysing injected embryos, performing a polymerase chain reaction assay across the target site, and running it on a sensitive electrophoresis system (Qiaxcel, Qiagen). Successful editing was indicated by a smear rather than a single band. Injection and subsequent laser-induced endothelial injury were performed at least twice for each knockdown, and the results were pooled.
Laser-induced endothelial injury in zebrafish
Zebrafish were maintained according to protocols approved by the University of Michigan Animal Care and Use Committee. Three days after fertilization, larvae were anesthetized using tricaine and mounted in 1.6% low melting agarose. Laser injury was performed using an Andor Micropoint pulsed-dye laser focusing system to target the posterior cardinal vein (PCV), 5 somites caudal to the anal pore,34,35 by an observer blinded to sample identify. Following injury, clotting was initiated in the PCV, and larvae were observed until complete occlusion of the vessel by this developing thrombus to block blood flow. The time to occlusion (TTO) of the PCV was recorded up to 120 seconds. The larvae were lysed and subjected to polymerase chain reaction assays using primers that flanked each target site to confirm successful editing.
Statistical analysis was performed using the ggpubr package in R, v4.1.1. Pairwise Wilcoxon signed rank tests were used for each gene to compare the TTO of uninjected controls and sgRNA-injected embryos. Injection with no sgRNA was used as a negative control. A sensitivity analysis was performed by pooling uninjected control measurements and comparing this distribution with that of each sgRNA-injected gene using a Wilcoxon signed rank test. A Bonferroni threshold of 0.0083 was used to account for 6 independent genes tested.
All animal studies were approved by the University of Michigan Institutional Animal Care and Use Committee with an approval date of 10 March 2022. The GWAS performed for each biobank was approved by each institution’s institutional review board, and only the summary statistics were used here.
Results
Multipopulation meta-analysis yielded 4 potentially novel loci
We performed single variant association analyses across 9 biobanks with diverse ancestries (supplemental Figure 1), followed by a meta-analysis (27 987 cases, 1 035 290 controls), to look for VTE-associated loci. We identified 38 genome-wide significant loci (Figure 1; supplemental Table 10). Of these, 34 lead SNPs were within 500 kb of a variant previously reported in GWAS or sequencing studies, and there were 4 potentially novel associations in DHRS3, HOXB2, ARHGAP4, and LINC02411 (Table 1). We identified a potentially novel locus (rs112106699 near DHRS3) that is rare in EUR ancestries (gnomAD v4.1.0 non-Finnish EUR allele frequency, 0.08%) but that is observed at higher frequency in AFR ancestry cohorts (gnomAD AFR/AFR American allele frequency, 9.0%), highlighting the importance of a multipopulation meta-analysis (supplemental Figure 5). The variant’s imputation quality score ranged from 0.70 to 0.94 (median, 0.84) in 8 contributing GWASs. We also identified a common 166 bp structural variant, rs1459062246, near HOXB2 and a rare variant, rs115924439, with a large effect size (odds ratio, 1.9; 1.5-2.4) near LINC02411, highlighting the importance of the inclusion of common and rare indels, structural variants, and single variants in GWASs. On the X chromosome, we found 2 independent loci, namely common variant p.Thr194Ala, known as F9 Malmö (rs6048),59 and a variant ∼200 kb upstream of BCOR (rs3002417). An intronic variant of FUNDC2 (rs17328181) that was previously nominally associated with VTE (P = 2 × 10–7)19 was also associated.
Schematic view of genes. Of the 38 genome-wide significant loci, 4 are potentially novel, and 34 are known from previous GWASs. Of the potentially novel genes, 2 have supportive evidence from recent GWAS meta-analyses.38,39 Six of the previously known genes were functionally validated in this study using a zebrafish model of blood clotting with 3 genes showing supportive evidence (significant validation) as the causal gene for modification of VTE in humans.
Schematic view of genes. Of the 38 genome-wide significant loci, 4 are potentially novel, and 34 are known from previous GWASs. Of the potentially novel genes, 2 have supportive evidence from recent GWAS meta-analyses.38,39 Six of the previously known genes were functionally validated in this study using a zebrafish model of blood clotting with 3 genes showing supportive evidence (significant validation) as the causal gene for modification of VTE in humans.
We performed lookups of novel variants in 2 large previously published GWAS summary statistics (ncases = 42 032; ncases = 81 19038). Two of the 4 novel lead SNPs from the discovery meta-analysis were nominally significant in at least 1 of the studies with 1 being significant at a Bonferroni threshold of 0.013. All 4 loci, except LINC02411, showed a consistent direction of effect between the discovery and lookup cohorts (supplemental Table 11). Because of the potential overlap in individuals between the Electronic Medical Records and Genomics Network samples in the 2019 INVENT meta-analysis and the BioVU samples in GBMI, we also compared the 2022 INVENT meta-analysis with the GBMI without the BioVU samples as a sensitivity analysis (supplemental Figure 6). In addition, in the meta-analysis of the INVENT summary statistics and the GBMI summary statistics without the BioVU samples, all of the known loci had a combined P value of <5 × 10–8, and of the novel loci, only the ARGHAP4 locus met this threshold (supplemental Table 12). Although these novel loci require subsequent replication in genetic studies, the meta-analysis results are robust enough for integrative bioinformatic gene prioritization.
Integrative gene prioritization nominates likely causal genes
For each autosomal locus (n = 35), we performed bioinformatic gene prioritization using 7 independent lines of evidence from genetic, biologic, and clinical databases. More specifically, for each of the associated genetic regions, we recorded a gene that (1) had a missense variant within the 95% credible variant set (8 loci), (2) was closest to the lead associated variant within 1 kb (25 loci), (3) was the only gene prioritized by DEPICT (8 loci), (4) was within the top 10% according to the PoPS scores (17 loci), (5) was an expression quantitative trait loci (eQTL; 4 loci), (6) was found in ClinVar for related phenotypes (5 loci), or (7) was prioritized via colocalization and proteome-wide Mendelian randomization (15 genes at 23 loci; supplemental Table 10). Notably, for the eQTL and ClinVar variants, they must have been the lead SNP at that locus. By summing the number of lines of evidence that support a given gene, we prioritized at least 1 gene with ≥2 lines of supporting evidence at 30 of the 35 loci (supplemental Table 10). The genes with the most evidence (5/7) to be likely causal were F5, PLEK, and PROS1 (Figure 2). Similarly, genes that had 4 lines of evidence that supported their role as a likely causal gene were PROC, FGG, F11, F2, VWF, and PLCG2.
Integrative gene prioritization. Autosomal genome-wide significant loci labeled by prioritized gene (x-axis) with shading for each line of evidence used in the bioinformatics-driven prioritization scheme (y-axis). Genes in bold were on the gold standard list (supplemental Table 7). The 7 lines of evidence evaluated were chosen to cover different mechanisms through which genetic variants contribute to disease risk, for example, regulatory changes vs protein perturbations. For VPS13D;DHRS3, F7;F10, and LINC02375;LINC02411, the genes had equal numbers of supporting lines of evidence. LINC00656 is also known as RP4-737E23.2. rs536995174 had 1 line of evidence each for SERPING1, SLC43A3, SLC43A1, F2, OR5AK4P, and LRRC55 and was excluded from this visualization. Genes with asterisks were selected for follow-up in a functional assay in zebrafish (Figure 3).
Integrative gene prioritization. Autosomal genome-wide significant loci labeled by prioritized gene (x-axis) with shading for each line of evidence used in the bioinformatics-driven prioritization scheme (y-axis). Genes in bold were on the gold standard list (supplemental Table 7). The 7 lines of evidence evaluated were chosen to cover different mechanisms through which genetic variants contribute to disease risk, for example, regulatory changes vs protein perturbations. For VPS13D;DHRS3, F7;F10, and LINC02375;LINC02411, the genes had equal numbers of supporting lines of evidence. LINC00656 is also known as RP4-737E23.2. rs536995174 had 1 line of evidence each for SERPING1, SLC43A3, SLC43A1, F2, OR5AK4P, and LRRC55 and was excluded from this visualization. Genes with asterisks were selected for follow-up in a functional assay in zebrafish (Figure 3).
Using our integrative gene prioritization approach, we identified 11 genes known to be involved in blood clotting (F2, F5, F9, F7, F10, F11, FGG, PROC, PROCR, PROS1, and VWF), including genes that are known to regulate blood clotting factors from functional analyses but that have not been identified previously in GWASs (eg, PROS1, STAB2, SERPINE2) and genes without known mechanisms (eg, TC2N, PLEK). Using Enrichr for the 38 prioritized candidate genes, we identified significantly enriched gene sets, including the Kyoto Encyclopedia of Genes and Genomes pathway of complement and coagulation cascades (supplemental Table 13; enrichment P = 4 × 10–14), the GO biologic process term of negative regulation of blood coagulation (supplemental Table 14; enrichment P = 6 × 10–8), and the GO molecular function term of serine-type endopeptidase activity (supplemental Table 15; enrichment P = 1 × 10–4). For the 4 novel loci from our GWAS, we prioritized 1 or 2 likely causal genes at each locus based on 2 lines of evidence (HOXB2, ARHGAP4) or only 1 line of evidence (DHRS3, LINC02411). The rare variant in DHRS3 and the large indel in HOXB2 did not have significant associations in the 2 recent GWASs and warrant further validation.
In vivo functional analyses in zebrafish provide evidence for the causal role of RASIP1 and TC2N
The bioinformatic gene prioritization showed evidence that a number of known coagulation factors contribute to VTE. Our previous studies validated several known coagulation factors using the genome-edited zebrafish models of hemostasis and thrombosis, including F2,60,F5,61,F10,62,PROS1,63 and PROC.63 We have also shown previously that the knockout of SERPINC1,34,PROS1, and PROC in zebrafish increased the TTO owing to a consumptive coagulopathy that was caused by excess thrombin activity and the consumption of fibrinogen. This can also be seen in severe human thrombosis, and therefore an increased TTO is consistent with VTE. An increased TTO was also observed with loss of function mutations in procoagulant genes (F2,60,F5,61 F1062). Because GWAS associations with VTE can either be protective or indicate increased risk, the TTO is a simple assay to confirm an individual gene’s causality and whether it is antithrombotic or prothrombotic.
Using this model, we evaluated 6 prioritized genes (F7, RASIP1, TC2N, STAB2, TSPAN15, and PLCG2) from regions that demonstrated conservation of synteny in the zebrafish genome. We chose genes across the spectrum of prioritized genes to see if those with more lines of evidence were more likely to be functional than genes with less lines of evidence. Because our aim was also to test the validity of the in silico gene prioritization, we focused on genes with evidence from multiple genetic studies rather than on our novel gene findings from the genetic discovery. PLCG2 had the most lines of supporting evidence of the genes that were novel when considering functional studies. From genes with 3 lines of evidence, F7 and STAB2 were selected—F7 was used as a positive control in this assay, and STAB2 has a missense variant in the 95% credible set. For the others, we selected those with the most significant P values among the genes with 2 lines of evidence that had either been associated with thrombosis through an unknown mechanism or that had not been previous implicated in coagulation.
CRISPR/Cas9 was used to create mosaic knockdown larvae64 for genotype-blinded evaluation of the TTO after induced endothelial injury. After accounting for multiple testing using the Bonferroni multiple testing correction, we have supportive evidence for RASIP1 (Wilcoxon signed-rank test P = 2.8 × 10–15) and TC2N (Wilcoxon signed-rank test P = 8.1 × 10–4) in the modification of human VTE (Figure 3). STAB2 and PLCG2 were nominally significant (P < .05). To increase power, we also pooled the noninjected controls from multiple experiments and compared the median TTO to the median for each knockdown. This secondary comparison provided additional evidence for RASIP1 and provided suggestive evidence for STAB2 and TSPAN15 (P < Bonferroni-adjusted threshold 0.0083) (supplemental Figure 7).
Functional evidence for causal genes in genetically modified zebrafish.P values from Wilcoxon rank sum tests are listed at the top. The y-axis represents the experimental TTO for control and sgRNA-injected zebrafish embryos with the x-axis showing the genes targeted through CRISPR. Injections made without sgRNA served as a negative control. Factor 7 (F7) served as a positive control.
Functional evidence for causal genes in genetically modified zebrafish.P values from Wilcoxon rank sum tests are listed at the top. The y-axis represents the experimental TTO for control and sgRNA-injected zebrafish embryos with the x-axis showing the genes targeted through CRISPR. Injections made without sgRNA served as a negative control. Factor 7 (F7) served as a positive control.
Discussion
In this study, we performed a multipopulation meta-analysis of GWASs of VTE, compared the findings with those of similar studies,38,39 and identify 4 potentially novel loci. Using a bioinformatics-driven gene prioritization heuristic, we identify prioritized genes at each locus and validated these through knockdown of the 6 putative causal genes in zebrafish. The integrative prioritization method is similar to previous studies, but no gold standard method exists for defining the most probable causal gene. One limitation of the credible sets used for prioritization is the unreliability of the fine-mapping results from a multipopulation meta-analysis.65 A purely bioinformatics-driven approach has its limitations, for example, SCARA5 only had 2 lines of evidence despite previous functional work suggesting its role in von Willebrand factor clearance.66 However, it can be useful for prioritizing genes for functional follow-up in situations in which the number of candidate genes in the region is too high to take forward into biologic models.
We can assign a candidate causal gene to 2 of the novel loci after supplementing bioinformatics-driven integrative gene prioritization with literature review. On chromosome 11, the intergenic variant rs11224340 both is an eQTL in tibial arterial tissue for ARHGAP4 and is 46 kb away. ARHGAP4 previously has been associated with blood pressure,67 whereas CNTN5, 278 kb away, has been associated with platelet count,68 white blood cell count,68 and red blood cell distribution width.69 On chromosome 17, the insertion falls into a gene-enhancer region between HOXB1 and HOXB2 and near HOXB2-AS1. Although there are blood trait associations in the GWAS Catalog, it is unclear which HOX gene in this region may be causal, although the associated lead variant is 2.4 kb upstream and also an eQTL for HOXB2. The potentially novel variants on chromosome 1 (DHRS3) and chromosome 12 (LINC02411) do not have clear candidate causal genes and are assigned based on proximity.
Other known genes had conflicting evidence in gene prioritization. On chromosome 6, the intronic variant rs10559566 is in CARMIL1, a gene previously associated with platelet counts,70 and was the highest prioritized gene based on PoPS, however, the strongest eQTLs for this variant point to SCGN. The location of the lead SNP and eQTLs indicated that JAZF1-AS was the likely causal gene of the intronic, noncoding RNA variant rs1513275 on chromosome 7, and JAZF1 is known to be associated with type 2 diabetes.71 This locus has been associated with F7 activity in a previous study72 in which no bioinformatic gene-prioritization was performed. However, de Vries et al did find that silencing JAZF1 in liver cells lowered the expression of F7 messenger RNA and protein. Because the original association was with F7 activity, which was only partially attenuated by the silencing, the authors concluded that the genetic variants in this locus might play independent roles in antigen and activity levels.
Although imperfect, the integrative gene prioritization provided a starting point for functional studies. Although the TTO was not significantly different after accounting for multiple testing, STAB2, TSPAN15, and PLCG2 remain candidates as causal genes. For STAB2, sequencing of 393 VTE cases and 6114 controls identified rare, damaging variants in the gene with strong evidence for a role in modifying thrombosis risk.73 Furthermore, mouse knockout of Stab2 was prothrombotic and led to the formation of large venous thrombi.74 The lack of functional confirmation in the zebrafish model could be because of limited statistical power, insufficient knockdown, or species-specific differences. In addition, this assay primarily tests the ability to produce fibrin-rich thrombi. It is possible that these genes affect thrombosis through modification of other pathways involved in clotting, including platelets and vasculature. For example, although RASIP1 knockdown did significantly alter the TTO, this gene is also a regulator of vascular integrity75 and therefore might mediate VTE through other mechanisms. The additional prioritized genes from the genome-wide significant loci remain intriguing for functional follow-up in future studies.
This work contributes to the understanding of genetic variation associated with VTE by layering information from genetic, medical, and biologic studies and models to link the genetic findings to biologic function. By combining evidence from a large-scale GWAS, a multitude of existing data sources, and bioinformatic tools for in silico follow-up and functional follow-up in in vivo model organism, we identified 38 genome-wide significant loci (4 potentially novel) with plausible underlying causal genes, 2 (RASIP1 and TC2N) of which had biologic support in the functional follow-up. These genes may be further studied to identify diagnostic or therapeutic targets that may aid in the management of VTE. Finally, further studies that integrate multiple layers of information will add to the understanding of the human genetic background of VTE and lead to new insights into VTE pathophysiology.
Acknowledgments
The authors acknowledge the biobank participants, recruitment teams, and project managers of the Global Biobank Meta-Analysis Initiative for providing their data for biomedical research and providing data aggregation, management, and distribution services in support of the research reported in this publication (especially Sinéad Chapman and Bethany Klunder). The authors acknowledge BioBank Japan (Yukinori Okada, Koichi Matsua, and Masahiro Kanai), BioMe (Ruth Loos, Judy Cho, Eimear Kenny, Michael Preuss, and Simon Lee), BioVU (Nancy Cox and Jibril Hirbo), Canadian Partnership for Tomorrow (Philip Awadalla and Marie-Julie Fave), China Kadoorie (Robin Walters, Kuang Lin, and Iona Millwood), Colorado Center for Personalized Medicine (Kathleen Barnes, Michelle Daya, and Chris Gignoux), deCODE Genetics (Kári Stefánsson and Unnur þorsteinsdóttir), East London Genes & Health (David A. van Heel, Sarah Finer, and Richard Trembath), Estonian Biobank (Andres Metspalu, Reedik Mägi, Tõnu Esko, and Priit Palta), FinnGen (Aarno Palotie, Mark Daly, Samuli Ripatti, Mitja Kurki, and Juha Karjalainen), Generation Scotland (Caroline Hayward and Riccardo Marioni), the Trøondelag Health Study (HUNT) (Kristian Hveem, Cristen Willer, Sarah Graham, Ben Brumpton, and Brooke Wolford), Lifelines (Serena Sanna and Esteban Lopera), Michigan Genomics Initiative (Sebastian Zoellner, Michael Boehnke, Lars Fritsche, and Anita Pandit), Million Veteran Program (Christopher J. O’Donnell), Netherlands Twin Register (D. I. Boomsma and M. G. Nivard), Partners Biobank (Jordan Smoller and Yen-Chen Feng), QIMR Berghofer (Sarah Medland, Stuart McGregor, and Nathan Ingold), Taiwan Biobank (Yen-Feng Lin, Yen-Chen Feng, and Hailiang Huang), University of California, Los Angeles Precision Health Biobank (Ruth Johnson, Yi Ding, Alec Chiu, Bogdan Pasaniuc, and Daniel Geschwind), and UK Biobank (Konrad Karczewski and Alicia Martin).
J.A.S. was supported by R35 HL150784 and the Henry and Mala Dorfman Family Professorship in Pediatric Hematology/Oncology. K.C.D. was supported by R01 HL172780. D.-A.T. was supported by the Multi-omics Approach to Trackle the Epidemiology of Venous Thromboembolism (EPIDEMIOM-VT) Senior Chair from the University of Bordeaux initiative of excellence and the Laboratory of Excellence on Medical Genomics (GENMED LabEx, ANR-10-LABX-0013), a research program managed by the National Research Agency (ANR) as part of the French Investment for the Future. C.J.W., I.S., K.-H.H.W., and B.N.W. were supported by R35-HL135824 (Willer, PI). S.M.D. was supported by IK2-CX001780. This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration, and was supported by award number BX003362. This work was supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program Grant MVP000; Department of Veterans. L.B. and B.M.B. work in a research unit funded by the Liaison Committee for education, research and innovation in Central Norway and the joint research committee of St. Olavs Hospital and the Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by the National Cancer Institute, National Human Genome Research Institute, National Heart, Lung, and Blood Institute, National Institute on Drug Abuse, National Institute of Mental Health, and National Institute of Neurological Disorders and Stroke. The data used for the analyses described in this manuscript were obtained from GTEx Analysis v8 on the GTEx Portal on 1 May 2021.
This publication does not represent the views of the Department of Veteran Affairs or the United States Government. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the United States Department of Health and Human Services.
Authorship
Contribution: W.Z., B.N.W., I.S., and K.-H.H.W. performed the bioinformatic analyses; B.M.B. and L.B. performed the Mendelian randomization; F.T., A.D.J., N.L.S., S.M.D., D.K., and D.-A.T. performed the lookup; Q.Y.Z., X.Y., C.E.R., and J.A.S. performed the functional analyses; B.N.W., I.S., K.-H.H.W., K.C.D., C.E.R., and J.A.S. designed the research and wrote the manuscript; V.L.F. and K.T. provided critical feedback and revision of the manuscript; and C.J.W., M.J.D., and B.M.N. provided critical reviews of the manuscript.
Conflict-of-interest disclosure: C.J.W. and K.-H.H.W. report being employed at Regeneron Pharmaceuticals, although they were not at the time of this study. D.K. reports being employed at Bitterroot Bio, although he was not at the time of this study. S.M.D. reports research support from Novo Nordisk and Amgen, outside the scope of the current research; and is named as a coinventor on a government-owned US Patent application related to the use of genetic risk prediction for venous thromboembolic disease filed by the US Department of Veterans Affairs in accordance with Federal regulatory requirements. J.A.S. reports serving as a consultant for Sanofi, Novo Nordisk, Biomarin, Takeda, Pfizer, Genentech, CSL Behring, and Medexus. The remaining authors declare no competing financial interests.
A complete list of the members of the Global Biobank Meta-analysis Initiative (GBMI) study group and INVENT, MVP consortium appears in the supplemental Appendix.
Correspondence: Ida Surakka, Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, NCRC Building 26, Room 361S, 2800 Plymouth Rd, Ann Arbor, MI 48109-2800; email: isurakka@umich.edu.
References
Author notes
B.N.W. and Q.Y.Z. contributed equally to this study.
J.A.S and I.S. are joint senior authors.
Genome-wide association study summary statistics are available for download at https://www.globalbiobankmeta.org/resources and for browsing at http://results.globalbiobankmeta.org. The integrative gene prioritization data may be found in a data supplement available with the online version of this article.
The full-text version of this article contains a data supplement.