Abstract
The electronic health record (EHR) documents interactions between patients and healthcare systems and is widely used for secondary research on health outcomes due to broad adoption. While EHR data support efficient clinical informatics research, they are subject to biases stemming from healthcare seeking behavior, health system attributes, and individual or environmental characteristics. Place-based characteristics, in particular, strongly influence when and how patients access care and shape health outcomes. These biases in EHR data can affect predictive modeling pipelines which leverage them, contributing to variation in model performance across groups. In healthcare, where predictive models are increasingly used to guide care, uneven performance may result in inadequate or inequitable care provision. This dissertation focuses on prenatal care as a critical use case, given the large patient population it serves and the rising rates of maternal and fetal morbidity and mortality in the U.S. Prenatal care access is shaped by persistent disparities across racial, socioeconomic, and geographic lines, which makes predictive modeling in this domain especially vulnerable to bias in EHR data. Preeclampsia, a serious pregnancy complication, is a target outcome where early diagnosis through predictive modeling could improve care and reduce adverse outcomes. To better understand and address spatial bias, this work investigates how such bias manifests across the predictive modeling pipeline. Specifically, it examines spatial bias at the stages of data collection, cleaning, and model training and evaluation in the context of a preeclampsia prediction task. The findings demonstrate notable spatial patterns in geographic representativeness of EHR data, EHR data quality, and model performance. These patterns suggest that spatial bias is not isolated to a single phase but embedded throughout the pipeline. As EHR data continue to be used in clinical research and decision-making, increased attention to spatial bias is critical. Transparency around these issues is essential to ensuring that predictive models support equitable and effective healthcare delivery.
Committee Chair
Philip Payne
Committee Members
Antonina Frolova; Min Lian; Randi Foraker; Yixin Chen
Degree
Doctor of Philosophy (PhD)
Author's Department
Interdisciplinary Programs
Document Type
Dissertation
Date of Award
8-18-2025
Language
English (en)
DOI
https://doi.org/10.7936/7dxj-j029
Recommended Citation
Lewis, Abigail, "Spatial Patterns in Electronic Health Record Data-Based Predictive Modeling: A Case Study of Prenatal Care and Preeclampsia" (2025). McKelvey School of Engineering Theses & Dissertations. 1272.
The definitive version is available at https://doi.org/10.7936/7dxj-j029