Abstract

The electronic health record (EHR) documents interactions between patients and healthcare systems and is widely used for secondary research on health outcomes due to broad adoption. While EHR data support efficient clinical informatics research, they are subject to biases stemming from healthcare seeking behavior, health system attributes, and individual or environmental characteristics. Place-based characteristics, in particular, strongly influence when and how patients access care and shape health outcomes. These biases in EHR data can affect predictive modeling pipelines which leverage them, contributing to variation in model performance across groups. In healthcare, where predictive models are increasingly used to guide care, uneven performance may result in inadequate or inequitable care provision. This dissertation focuses on prenatal care as a critical use case, given the large patient population it serves and the rising rates of maternal and fetal morbidity and mortality in the U.S. Prenatal care access is shaped by persistent disparities across racial, socioeconomic, and geographic lines, which makes predictive modeling in this domain especially vulnerable to bias in EHR data. Preeclampsia, a serious pregnancy complication, is a target outcome where early diagnosis through predictive modeling could improve care and reduce adverse outcomes. To better understand and address spatial bias, this work investigates how such bias manifests across the predictive modeling pipeline. Specifically, it examines spatial bias at the stages of data collection, cleaning, and model training and evaluation in the context of a preeclampsia prediction task. The findings demonstrate notable spatial patterns in geographic representativeness of EHR data, EHR data quality, and model performance. These patterns suggest that spatial bias is not isolated to a single phase but embedded throughout the pipeline. As EHR data continue to be used in clinical research and decision-making, increased attention to spatial bias is critical. Transparency around these issues is essential to ensuring that predictive models support equitable and effective healthcare delivery.

Committee Chair

Philip Payne

Committee Members

Antonina Frolova; Min Lian; Randi Foraker; Yixin Chen

Degree

Doctor of Philosophy (PhD)

Author's Department

Interdisciplinary Programs

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

8-18-2025

Language

English (en)

Available for download on Saturday, August 15, 2026

Included in

Biostatistics Commons

Share

COinS