Background: Natural Language Processing (NLP) is a software driven method of evaluating information of interest. NLP allows for the rapid identification and distinction between disease states, such as acute and chronic venous thromboembolism (VTE), desirable in large datasets.

Methods: The aim of this study was to externally validate the use of NLP algorithms developed and internally validated at Mayo Clinic to identify the presence of acute pulmonary embolism (PE) or acute deep vein thrombosis (DVT) from radiologic reports. A dataset of adults who underwent relevant imaging within 90-days after COVID-19 vaccination between 11/1/2020 and 11/1/2021 was used.

Verification was achieved by use of two configuration files for radiologic reports (one with CT reports and one with venous duplex ultrasound reports of the extremities) in conjunction with the open-source software SimpleNLP. The data from the NLP analyses were then compared to a blinded determination if the report showed an acute VTE event performed separately by two individuals (DS, LBK). A random sample of 50 reports determined by the NLP algorithm to be positive and 50 negative reports were reviewed.

Results: A total of 3,499 images were identified in patients within 90 days of a COVID-19 vaccination (CT scan, n=2431, ultrasound (n=912, lower extremity =790, upper extremity=122). Of the patients who had a chest CT, 96 were identified by the NLP as positive for acute PE and 2335 as negative. Within the random sample of the radiology reports, 49/50 positive and 50/50 negative scans were confirmed, resulting in 100% sensitivity and 98% specificity. The scan falsely identified with acute PE by the NLP noted "A tiny focus of hypoattenuation ...which could represent a small nodule adjacent to the vessel, although a very tiny pulmonary embolus is possible." Of the patients who underwent US for suspected DVT, 100 were identified as positive. Within the random sample, 49/50 ultrasounds were confirmed positive and 50/50 were confirmed negative for acute DVT resulting in a 100% sensitivity and 98% specificity. The ultrasound that was falsely identified with acute DVT said "No definitive central deep venous thrombus in the right upper extremity. Adjacent to the brachial artery courses a hyperechoic structure with no internal flow. It is possibly though less likely an occlusive thrombus in a paired brachial vein."

Conclusion: The PE and DVT NLP algorithms have been externally validated to accurately identify acute VTE and exclude chronic VTE events. Ambiguous imaging results led to limited false positive results. The validated NLP algorithms may provide a more accurate identification of acute VTE than ICD-10 codes and can be used in large datasets.

No relevant conflicts of interest to declare.

Author notes

Asterisk with author names denotes non-ASH members.

Sign in via your Institution