Abstract
Pulmonary embolism (PE) is a common, potentially life-threatening yet treatable condition. Prompt diagnosis and expeditious therapeutic intervention is of paramount importance for optimal patient management. Our objective was to systematically review the accuracy of D-dimer assay, compression ultrasonography (CUS), computed tomography pulmonary angiography (CTPA), and ventilation-perfusion (V/Q) scanning for the diagnosis of suspected first and recurrent PE. We searched Cochrane Central, MEDLINE, and EMBASE for eligible studies, reference lists of relevant reviews, registered trials, and relevant conference proceedings. 2 investigators screened and abstracted data. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 and certainty of evidence using the Grading of Recommendations Assessment, Development and Evaluation framework. We pooled estimates of sensitivity and specificity. The review included 61 studies. The pooled estimates for D-dimer sensitivity and specificity were 0.97 (95% confidence interval [CI], 0.96-0.98) and 0.41 (95% CI, 0.36-0.46) respectively, whereas CTPA sensitivity and specificity were 0.94 (95% CI, 0.89-0.97) and 0.98 (95% CI, 0.97-0.99), respectively, and CUS sensitivity and specificity were 0.49 (95% CI, 0.31-0.66) and 0.96 (95% CI, 0.95-0.98), respectively. Three variations of pooled estimates for sensitivity and specificity of V/Q scan were carried out, based on interpretation of test results. D-dimer had the highest sensitivity when compared with imaging. CTPA and V/Q scans (high probability scan as a positive and low/non-diagnostic/normal scan as negative) both had the highest specificity. This systematic review was registered on PROSPERO as CRD42018084669.
Introduction
Pulmonary embolism (PE) is a common, potentially life-threatening yet treatable condition.1-7 The annual incidence of PE is 60 to 70 cases per 100 000. In the United States and Europe, PE accounts for 100 000 and 300 000 annual deaths, respectively.8-10 Consequently, prompt diagnosis and expeditious therapeutic intervention is of paramount importance for optimal patient management.11
Excluding PE is also of paramount importance because of the bleeding risks of anticoagulation and costs associated with treatment and monitoring. Various strategies are currently used to evaluate patients with suspected PE. Commonly used tests include D-dimer assays, compression ultrasonography (CUS), computed tomography pulmonary angiography (CTPA), and ventilation-perfusion (V/Q) scanning. The tests each have benefits and limitations. Imaging tests for PE such as CTPA and V/Q lung scanning are expensive, time-consuming, and are associated with radiation exposure. In addition, the contrast used in CTPA can cause nephrotoxicity and allergic-like reactions. Therefore, to exclude PE efficiently, patients undergo initial tests that are cost-effective with low risk; tests such as CTPA and V/Q are reserved for patients in whom PE was not initially excluded.12
The aim of this systematic review is to determine the accuracy of commonly available diagnostic tests for PE, which can be used to inform a combined strategy for diagnosis. Pooled estimates of sensitivity and specificity obtained in this systematic review were used to model different diagnostic strategies for patients with suspected PE. The results of modeling were used to inform evidence-based recommendations on diagnostic strategies for deep vein thrombosis (DVT) in the American Society of Hematology clinical practice guidelines for diagnosis of venous thromboembolism.13
Methods
Search strategy and data sources
We searched MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials from inception until May 2019. We also manually searched the reference lists of relevant articles and existing reviews. Studies published in any language were included in this review. We limited the search to studies reporting data for accuracy of diagnostic tests. The complete search strategy is available in supplemental Material 1. The prespecified protocol for this review is registered on PROSPERO (registration number CRD42018084669). This review is reported in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses for diagnostic test accuracy guidelines.13
Study selection
Studies.
Studies reporting data on diagnostic test accuracy (randomized control trials, cohort studies, cross-sectional studies) for PE were eligible for inclusion in this systematic review.
Participants.
Adult patients ≥18 years of age, presenting to inpatient or outpatient settings with suspected first or recurrent episode of PE were eligible for inclusion
Index tests for diagnosis.
Studies assessing test accuracy of V/Q scan, multidetector CTPA, CUS, and D-dimer assays at standard cutoffs (Vidas ELISA Assay at 500 ng/mL, STA Liatest D-Di Assay at 500 ng/mL, Tina-quant D-dimer Assay at 500 ng/mL, Innovance D-dimer at 500 ng/mL, and HemoSIL D-dimer Assay at 230 ng/mL) to diagnose a first or recurrent episode of symptomatic PE.
Reference standards.
Angiography, positive lower extremity ultrasound for DVT in the setting of suspicion for PE, and/or clinical follow-up were eligible as a reference standard for V/Q scan or CTPA. V/Q scan, CTPA, compression ultrasound for DVT in the setting of suspicion for PE, and/or clinical follow-up were considered appropriate reference standards for D-dimer assays. If a reference diagnostic test was not conducted, clinical follow-up for symptoms alone was sufficient as a reference standard.
Exclusion criteria.
Exclusion criteria was determined by unanimous guideline panel consensus. We excluded studies that did not provide sufficient data to determine test accuracy (sensitivity and specificity) and abstracts published before 2014 because the complete studies were likely published in peer-reviewed journals. Studies with sample size <100 patients were excluded to increase feasibility. A sensitivity analysis was performed and indicated that this would not affect the pooled test accuracy estimates. The quality of small test accuracy studies informing a clinical practice guideline was a concern; therefore, these studies were excluded.
Patients that were asymptomatic and pregnant were excluded. Studies reporting on both adult and pediatric patients were eligible for inclusion but were excluded when >80% of the study sample was younger than 18 years of age or if the mean age was younger than 25 years. When possible, we extracted data separately for adult patients from these studies.
Studies that used unsuitable reference standards were excluded (V/Q single-photon emission CT, transthoracic ultrasound, single-detector CT, impedance plethysmography, and D-dimer). D-dimer studies were excluded if they used assays that are no longer in use and/or are not highly sensitive (MDA, Asserachrom, Dimertest I, Enzygnost, Fibrinostika FbDP, Acculot, Wellcotest, Minutex), if they used a nonquantitative assay (SimpliRed), or if they considered a positive threshold other than the defined clinical cutoffs.
We excluded studies evaluating V/Q test accuracy that were published before the year 2000 unless it included a screening process with chest radiograph or other testing before V/Q testing. Finally, we excluded studies that did not provide a breakdown of the V/Q scan interpretation (normal, low/intermediate, and high probability).
Screening and data extraction
Independent reviewers conducted title and abstract screening and full-text review in duplicate to identify eligible studies. Data extraction was also conducted independently and in duplicate and verified by a third author (R.M.). Disagreements were resolved by discussion to reach consensus, in consultation with 2 expert clinician scientists (R.M. and W.L.). Data extracted included general study characteristics (authors, publication year, country, study design), diagnostic index test and reference standard, prevalence of PE, and parameters to determine test accuracy (ie, sensitivity and specificity of the index test).
Risk of bias and certainty of evidence
We conducted the risk of bias assessment for diagnostic test accuracy studies using the Quality Assessment of Diagnostic Accuracy Studies-2 revised tool.14
The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework was used to assess overall certainty by evaluating the evidence for each outcome on the following domains: risk of bias, imprecision, inconsistency, indirectness, and publication bias.15,16
Data synthesis
The accuracy estimates from individual studies were combined quantitatively (pooled) for each test using OpenMetaAnalyst (http://www.cebm.brown.edu/openmeta/). We conducted a bivariate analysis for pooling sensitivity and specificity for each of the test comparisons to account for variation within and between studies. Forest plots were created for each comparison. The Breslow-Day test was used to measure the percentage of total variation across studies because of heterogeneity; however, the results did not influence our judgment of the pooled estimates because the literature has discouraged its use for test accuracy.17 To better illustrate the impact of the sensitivity and specificity, absolute differences in effects were calculated for each comparison as true positives, true negatives, false positives, and false negatives.
Diagnostic strategies for PE are based on assessment of the pretest probability (PTP) for individual patients, which provides an estimate of the expected prevalence of PE at a population level. Prevalence estimates for PE were obtained from a meta-analysis of 29 studies of 31 215 patients in which the 3-level Wells score was evaluated in 14 studies.18 The pooled prevalence of PE in these studies was 5.7% in the low PTP, 23.2% in the intermediate PTP, and 49.3% in the high PTP category. We used similar disease prevalence estimates to determine the absolute differences in effects among patients with clinical suspicion of PE: 5% corresponding approximately to low PTP, 20% for intermediate PTP, and 50% and 75% for high PTP. The review also discusses recurrent PE for which the prevalence was modeled at 30% and 40%. We calculated the absolute differences in effects for each comparison as true positives, true negatives, false positives, and false negatives. Here, we present the results for the low PTP population; results for intermediate and high PTP groups are reported in supplemental Material 2.
Results
Description of studies
Among the 15 453 nonduplicate records identified from the initial electronic database search and from other sources, 355 articles in full text were retrieved after title and abstract screening. An updated search of the electronic database was performed with 1391 nonduplicate records identified. Of these, 21 articles in full text were retrieved after title and abstract screening. After exclusion of articles, a total of 61 studies were included for data abstraction. A list of excluded studies is provided in supplemental Material 3. Reasons for exclusion at full-text review or data abstraction stages were ineligible study design (n = 67), study population (n = 45), diagnostic test (n = 46), full text not available (n = 19), or unacceptable reference standards and/or studies did not provide enough information to determine sensitivity and specificity (n = 139). Figure 1 shows the study flow diagram for included studies.
First-episode PE studies reported the test accuracy of the following index tests in comparison with a reference standard: 34 studies on D-dimer,19-52 1 study on age-adjusted D-dimer,43 16 studies on CTPA,26,28,35,38,42,43,53-62 7 studies on compression ultrasound,42,63-68 and 13 studies V/Q scan.26,53,57,67-76 Recurrent PE studies reported test accuracy from 3 studies.77-79 Table 1 summarizes general characteristics of included studies, as well the index and reference standards. The majority of included studies were judged to be low risk of bias for patient selection, index test, and reference standard interpretation. Although there was unclear reporting regarding flow and timing in some studies, the certainty of evidence was generally not downgraded for risk of bias. The complete risk of bias assessment for individual studies is included in supplemental Material 4.
D-dimer
Test accuracy data for D-dimer were pooled from 34 studies, with a total of 22 849 participants.19-52 The Vidas D-dimer assay had a sensitivity of 0.97 (95% confidence interval [CI], 0.95-0.99) and a specificity of 0.41 (95% CI, 0.36-0.46), Tina-quant D-dimer had a sensitivity of 0.92 (95% CI, 0.83-0.96) and a specificity of 0.41 (95% CI, 0.39-0.60), and STA Liatest D-dimer had a sensitivity of 0.98 (95% CI, 0.93-0.99) and a specificity of 0.40 (95% CI, 0.32-0.49). The pooled estimates for D-dimer sensitivity and specificity were 0.97 (95% CI, 0.96-0.98) and 0.411 (95% CI, 0.36-0.46), respectively. Figure 2 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates for all D-dimer assays. Figures 3-5 show the forest plots displaying sensitivity and specificity from individual studies for specific assays.
D-dimer results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative results and a high proportion of false-positive results (>5%). Overall, the test was shown to be highly sensitive but had low specificity. The certainty of evidence was moderate. Table 2 shows the summary of findings.
Age-adjusted D-dimer
Test accuracy data for age-adjusted D-dimer were pooled from 1 study, with a total of 2885 participants.43 We did not include retrospective validation studies. The pooled estimates for age adjusted D-dimer sensitivity and specificity were 0.99 (95% CI, 0.98-1.00) and 0.47 (95% CI, 0.45-0.49), respectively. Figure 2 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates.
Age-adjusted D-dimer results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative results and a high proportion of false-positive results (>5%). Overall, the test was shown to be highly sensitive but had low specificity. The certainty of evidence was high. Table 3 shows the summary of findings.
CTPA
Test accuracy data for CTPA were pooled from 16 studies, with a total of 4392 participants.26,28,35,38,42,43,53-62 The pooled estimates for CTPA sensitivity and specificity were 0.94 (95% CI, 0.89-0.97) and 0.98 (95% CI, 0.97-0.99), respectively. Figure 6 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates.
CTPA results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative and false-positive results. Overall, the test was shown to be highly sensitive and specific and the certainty of evidence was moderate. Table 4 shows the summary of findings.
Compression ultrasound
Test accuracy data for proximal vein CUS (ie, proximal to the calf veins [trifurcation veins and higher]) were pooled from 7 studies, with a total of 1715 participants.42,63-68 The pooled estimates for CUS sensitivity and specificity were 0.49 (95% CI, 0.31-0.66) and 0.96 (95% CI, 0.95-0.98), respectively. Figure 7 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates.
CUS results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-positive results and a high proportion of false-negative results (>5%). Overall, the test was shown to be highly specific but had low sensitivity. The certainty of evidence was low. Table 5 shows the summary of findings.
V/Q scan
Test accuracy data for V/Q scans were pooled from 13 studies, with a total of 3994 participants.26,53,57,67-76 Three variations of pooled estimates for sensitivity and specificity of V/Q scan were carried out.
V/Q scans for which high probability scans were considered positive and low/nondiagnostic/normal scans were considered negative had a sensitivity and specificity of 0.58 (95% CI, 0.50-0.66) and 0.98 (95% CI, 0.96-0.99), respectively. Figure 8 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates. V/Q scan results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-positive results and a high proportion of false-negative results (>5%). Overall, the test was shown to be highly specific but had low sensitivity. The certainty of evidence was moderate. Table 6A shows the summary of findings.
V/Q scans with high/nondiagnostic/low probability scans considered as positive and normal scans as negative had a sensitivity and specificity of 0.98 (95% CI, 0.95-0.99) and 0.36 (95% CI, 0.27-0.45), respectively. Figure 9 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates. V/Q scan results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative results and a high proportion of false-positive results (>5%). Overall, the test was shown to be highly sensitive but had low specificity. The certainty of evidence was moderate. Table 6B shows the summary of findings.
V/Q scans for which high-probability scans were considered positive and normal scans as negative had a sensitivity and specificity of 0.96 (95% CI, 0.91-0.98) and 0.95 (95% CI, 0.89-0.98), respectively. Figure 10 shows the forest plot displaying the sensitivity and specificity from individual studies and the pooled estimates. V/Q scan results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative and false-positive results. Overall, the test was shown to be highly sensitive and specific and the certainty of evidence was moderate. The certainty of evidence was high. Table 6C shows the summary of findings.
Recurrent PE
Test accuracy data for recurrent PE were pooled from 3 studies.77-79 Tables 7 and 8 show the modeled data findings for this comparison.
For the sequence of D-dimer testing for low clinical PTP patients, CTPA testing for low clinical PTP patients with positive D-dimer or high clinical PTP patients, the pooled estimates for sensitivity and specificity were 0.97 (95% CI, 0.94-0.98) and 1.00 (95% CI, 0.99-1.00), respectively. This diagnostic algorithm was illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative and false-positive results. Overall, the test was shown to be highly sensitive and specific and the certainty of evidence was moderate. The certainty of evidence was moderate. Table 7 shows the summary of findings.
For D-dimer alone, the pooled estimates for sensitivity and specificity were 1.00 (95% CI, 0.97-1.00) and 0.27 (95% CI, 0.21-0.34), respectively. The certainty of evidence was low for true positives, true negatives, false positives, and false negatives. D-dimer results were illustrated for 1000 patients from a low-prevalence population undergoing the test, and absolute differences indicate a low (<5%) proportion of false-negative results and a high proportion of false-positive results (>5%). Overall, the test was shown to be highly sensitive but had low specificity. The certainty of evidence was low. Table 8 shows the summary of findings.
Discussion
This review presents pooled estimates of test accuracy for commonly available diagnostic methods for PE. The certainty of evidence ranged from low to high for test accuracy. The only diagnostic test with a low certainty of evidence was CUS, whereas the other tests had moderate to high certainty of evidence. Of the evaluated tests, D-dimer had the highest sensitivity at 0.97 (95% CI, 0.96-0.98), with age-adjusted D-dimer having an even higher sensitivity of 0.99 (95% CI, 0.98-1.00). CTPA and V/Q scans (high probability scan as a positive and low/nondiagnostic/normal scan as negative) both had the highest specificity at 0.98 (95% CI, 0.97-0.99) and 0.98 (95% CI, 0.96-0.99), respectively. The sensitivity and specificity results obtained in this systematic review were used in a model to determine the effects of different strategies to diagnose patients suspected of having PE. The modeling results were used to make evidence-based recommendations on diagnostic test approaches to PE in the American Society of Hematology evidence-based guidelines.13
This review has several strengths. The comprehensive and systematic approach for identifying studies makes it unlikely that relevant studies were missed. We also attempted to include studies with insufficient information to abstract test accuracy by contacting researchers of those studies to obtain primary data. For example, in The Christopher Study,80 we were unable to abstract the patients that were low PTP with a positive D-dimer who underwent a CTPA from the high PTP patients that went straight to CTPA. We were unable to obtain the primary data; therefore, the study was excluded. Several post hoc analyses papers had the missing data, so these were included for analysis but the original study was excluded to avoid duplication. Additionally, we did not limit our review by language and translated articles that were not published in English. Finally, we assessed the certainty of evidence in this area and identified sources of bias.
There are a few limitations of the present review. The high sensitivity of age-adjusted D-dimer is limited by the fact that only 1 study evaluating age-adjusted D-dimer prospectively was identified for analysis. We excluded many emerging and promising modalities such as magnetic resonance imaging (and V/Q single-photon emission CT) because limited data are available. In addition, many of the studies that were included did not have an actual reference test. Occasionally, studies used follow up (eg, 3 months, 6 months) as a reference standard to testing, which was deemed acceptable by the panel. Clinically insignificant PE may be missed with follow-up as a reference, but this was acceptable because it determines the performance of the test in a clinically significant setting. Last, the diagnostic test accuracy estimates were determined for a test done in a standalone manner, and we did not consider combinations of tests in a pathway for establishing a diagnosis of PE. This may be required, for example, in patients who have a low PTP but have a positive D-dimer. The pooled sensitivity and specificity estimates of the tests from this review only apply when the test is performed alone, which may be applicable in certain populations. For example, compression ultrasound is rarely used as a standalone test for the diagnosis of PE; however, certain clinical conditions may necessitate the use of compression ultrasound alone (eg, patients who cannot initially undergo any direct lung imaging because of renal failure or pregnancy). The results in this review can be used to model various diagnostic strategies to inform clinical decision-making. Ultimately, the diagnostic tests will be used in a strategic approach based on clinical pretest probability and with consideration of availability, cost, and patient and provider values and preferences.
In conclusion, this systematic review synthesizes and evaluates the accuracy of commonly used tests for the diagnosis of PE. Estimates of sensitivity and specificity from this review were used to model diagnostic strategies and inform evidence-based recommendations for a clinical practice guideline.13 The prevalence or pretest probability for PE along with the sensitivity and specificity estimates will influence clinical decision-making and patient management.
Acknowledgments
The systematic review team would like to acknowledge the Canadian Agency for Drugs and Technologies in Health team for their assistance with data management and organization of the manuscript.
This systematic review was conducted to support the development of the American Society of Hematology (ASH) 2018 guidelines for management of venous thromboembolism: diagnosis of venous thromboembolism. The entire guideline development process was funded by ASH. Through the McMaster GRADE Center, some researchers received salary (Parth Patel, Payal Patel, C.B., M.B., W.W., H.B., and J.V.) or grant support (R.A.M. and H.J.S.); others participated to fulfill requirements of an academic degree or program or volunteered their time.
Authorship
Contribution: Parth Patel contributed to study design, search strategy, study selection, data extraction, statistical analysis, and drafting the report; Parth Patel, M.B., C.B., H.B., R.A.M., M.A.K., N.M.H., Y.N.A.J., and Payal Patel contributed to study design, study selection, data extraction, statistical analysis, and critical revision of the report; J.V., D.W., H.A., M.T., M.B., W.B., R. Khatib, R. Kehar, R.P., A.S., and A.M. contributed to study selection and data extraction; and W.W., W.L., G.L.G., S.M.B., L.B.H., J.K., E.L., M.R., H.J.S., and R.A.M. contributed to the study design, interpretation of the results, and critical revision of the report.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Reem A. Mustafa, Division of Nephrology and Hypertension, Department of Medicine, University of Kansas Medical Center, 3901 Rainbow Blvd, MS3002, Kansas City, KS 66160; e-mail: ramustafa@gmail.com.
References
Author notes
The full-text version of this article contains a data supplement.