McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Clinical Information Extraction from Electronic Health Records: Leveraging Large Language Models and Evaluating Their Outputs

KRITI BHATTARAI, Washington University in St. LouisFollow

Abstract

Accurate extraction of clinical entities and phenotypes from unstructured electronic health record (EHR) text is crucial for various clinical research tasks, including cohort identification, tracking temporal patterns in disease progression and deciding treatment course. However, this task remains challenging due to the complexity and ambiguity of medical language. This dissertation explores the application of advanced generative pre-trained transformer (GPT) models, such as GPT-4, GPT-3.5-turbo, Llama-3.1, Llama-3 and Flan-T5, for clinical entity and phenotype extraction from EHRs. Building upon these findings, this dissertation also investigates a hybrid approach where integration of external knowledge sources, such as Unified Medical Language System (UMLS) and large language models (LLMs), is evaluated to improve the quality of the language model outputs. Through extensive experiments and evaluation, we find that extraction improvements are possible with LLMs and knowledge base integration. We also find that LLM hyperparameters, such as temperature and prompt variations significantly impact consistency and accuracy, with lower temperature setting yielding more stable outputs but not necessarily higher accuracy. Additionally, variations in clinical text and model configurations reveal a trade-off between consistency and performance, suggesting that careful tuning is essential for balancing reliable results with clinical accuracy. These findings have implications for facilitating faster and more accurate cohort identification, supporting clinical decision-making, identifying temporal patterns in disease progression, and ultimately enabling more effective utilization of EHR data for clinical informatics research.

Committee Chair

Albert Lai

Committee Members

Chenyang Lu; Fuhai Li; Philip Payne; Roger Chamberlain

Degree

Doctor of Philosophy (PhD)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

12-17-2024

Language

English (en)

DOI

https://doi.org/10.7936/8gm0-5x80

Recommended Citation

BHATTARAI, KRITI, "Improving Clinical Information Extraction from Electronic Health Records: Leveraging Large Language Models and Evaluating Their Outputs" (2024). McKelvey School of Engineering Graduate Student Theses & Dissertations. 1131.

The definitive version is available at https://doi.org/10.7936/8gm0-5x80

Download

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.7936/8gm0-5x80

McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Clinical Information Extraction from Electronic Health Records: Leveraging Large Language Models and Evaluating Their Outputs

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Graduate Student Theses & Dissertations

Improving Clinical Information Extraction from Electronic Health Records: Leveraging Large Language Models and Evaluating Their Outputs

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner