Introduction: Chimeric Antigen Receptor T-cell (CAR T-cell) therapy represents a significant advancement in oncology, improving outcomes for patients with certain hematologic malignancies like diffuse large B-cell lymphoma (DLBCL). However, the documentation of the CAR T-cell therapy journey is often embedded in unstructured clinical narrative notes within electronic health records (EHRs), making the research process labor-intensive and costly. To address this challenge, we developed a Natural Language Processing (NLP) methodology to streamline the identification of patients receiving CAR T-cell therapy.
Methods: Patients were selected from The US Oncology Network and selected non-Network practices, encompassing a nationally representative network of over 2,700 providers and more than 1 million patients seen annually in community-based oncology practices. Adult (≥18 years) patients with DLBCL who had at least two office visits were included. Progress notes were extracted from iKnowMedSM, an oncology-focused EHR system. Four analyses were conducted across different time periods, covering a total study period from 1/1/2016 to 6/30/2024. A total of 4,195 patients and 206,096 notes were analyzed. Regular expressions (regex) were employed to filter progress notes mentioning CAR T-cell therapy, accounting for variations in spelling and punctuation, as well as FDA-approved brand and generic names. The NLP modeling distinguished between patients who definitively received CAR T-cell therapy and those who did not, according to their clinical notes. NLP models selected for evaluation include Spark NLP for Healthcare models and OpenAI's GPT-3.5T and 4o models. The NLP model classified unstructured note excerpts into three categories: (1) Past: The patient definitively received CAR T-cell therapy, (2) Possible: The patient might receive or is being considered for CAR T-cell therapy, (3) Absent: The patient is not eligible for CAR T-cell therapy, or the note does not mention it. Precision, recall, and F1 scores for both the regex filtering and NLP model classifications were calculated.
Results: The number of patients identified for CAR T-cell therapy across the four analyses varied between 154 and 3,009, with a median of 44 progress notes per patient (interquartile range [IQR]: 58). Example notes corresponding to the CAR-T predicted labels included Past CAR-T (“This provider completed the CAR T-Cell with a plan to follow up with Allogeneic Transplant following CAR T-Cell.”), Possible CAR-T (“He is planned for CAR-T cell therapy”) and Absent CAR-T (“Patient does not want to participate in CAR-T cell therapy”). Regex filtering achieved a precision of 0.88, recall of 0.90, and F1-Score of 0.89. Among the NLP models used, OpenAI GPT-4o outperformed others with a precision of 0.86 (95% CI: 0.84-0.88), recall of 0.84 (95% CI: 0.82-0.86), and F1 score of 0.85 (95% CI: 0.83-0.86). The models identified 1,337 patients in total based on CAR T-cell therapy status, with the following distribution: 358 Past CAR-T, 669 Possible CAR-T, and 310 Absent CAR-T mentions.
Conclusion: Our novel NLP-based approach efficiently identified patients receiving CAR T-cell therapy from unstructured clinical notes in EHRs. This model accurately transformed notes into structured data, aiding in the identification of CAR T-cell therapy status among patients with DLBCL. Despite challenges with note ambiguity and negation handling, our approach automates the extraction and categorization of therapy mentions, significantly reducing the time and cost of manual chart abstraction. Continued development of this methodology can enable more efficient research regarding real-world benefits of CAR-T therapy. Future research will enhance these models and investigate their applications to other critical research questions using EHR-based unstructured data. NLP and AI tools can improve the efficiency, scale, and cost-effectiveness of chart abstraction, enabling more focused use of human resources in real-world research.
Raju:Ontada, part of McKesson: Current Employment. Herms:Ontada, part of McKesson: Current Employment, Current equity holder in publicly-traded company. Su:Ontada, part of McKesson: Current Employment. Zackon:Ontada, part of McKesson: Current Employment; Cardinal Health: Other: My wife is a CMO. . Paulus:Ontada, part of McKesson: Current Employment.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal