Key points:
100-day VTE incidence rate by manual chart review and our institutional natural language processing (NLP) is 10.3% and 8.8% respectively.
The sensitivity, specificity, PPV, and NPV of the NLP were ≥ 85% when considering all VTE events.
Abstract:
The annual incidence of venous thromboembolism (VTE) may be 50-fold increased after allogeneic hematopoietic stem cell transplant (HSCT). Such incidence data, as well as data that establish clinical variables resulting in this enhanced risk, have generally required manual chart review. This cumbersome process can be improved by natural language processing (NLP) algorithms designed to detect VTE in electronic medical record systems. We describe the development of an institutional NLP algorithm for VTE detection, and our evaluation of its performance in detecting VTE in patients who recently underwent HSCT. We retrospectively reviewed adult patients between 2016-2020. NLP assessed patient records for acute VTE within 100 days of HSCT and manual chart review was performed for comparison. NLP sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. 1,300 electronic health records were analyzed. The 100-day VTE incidence rate as determined by manual chart review and NLP was 10.3% and 8.8%, respectively. NLP’s specificity, sensitivity, PPV, and NPV were > 0.85. Of the 19 events not identified by NLP, all were found in radiology or vascular laboratory reports overlooked by NLP. These results demonstrate excellent performance of NLP for identifying VTE in HSCT patients. Future refinement of the NLP and its combination with other detection methods should provide better detection of VTE in this and other at-risk cohorts.
Author notes
Data Sharing Statement All original data can be obtained by emailing the Corresponding Author.