McKelvey School of Engineering Theses & Dissertations

Modeling Metastasis in Breast Cancer Patients Using EHR Data, the Area Deprivation Index (ADI), and Machine Learning Models

Vishesh Patel, Washington University in St. LouisFollow

Abstract

Applying machine learning and statistical analysis on traditionally informatics problems is a growing area of research that can result in clinicians being better-able to predict disease outcomes and create more personalized levels of care. In this study, several machine learning models are used to model the likelihood of metastasis in breast cancer patients using a mix of data from the electronic health record and socioeconomic information derived from the Area Deprivation Index (ADI). Metastasis is a late-stage disease progression in a cancer diagnosis where a tumor spreads from its initial development point to another part of the body. In breast cancer, the most diagnosed cancer in the United States, more research is needed to assess what characteristics in breast cancer-diagnosed patients may result in a metastasis The electronic health record (EHR) has emerged as a vast source of information for researchers, despite its primary usage purpose for billing. While demographic and clinical information is commonly logged in the EHR, socioeconomic information is generally unavailable. Information from social deprivation indices can be mapped to patients using geographical information such as zip codes. Social determinants of health (SDoH) are the characteristics of the environment and population that people live in, and studies have shown that living in areas with greater social disadvantage result in more adverse health outcomes. Hence, the focus of this research is two-fold. The first is to model metastasis prediction using a variety of machine learning models and assess which types perform best on the data engineered. The second is to assess, given the evidence that suggests that socioeconomic indicators contribute to health outcomes prediction, how predictive such values are in models that contain clinical information which traditionally have been the main predictors of health outcomes. In this study, tree-based algorithms such as Random Forest and XGBoost had the greatest predictive performance, but within those models scores that measure health using other comorbidities and other clinical variables overshadow the performance of the Area Deprivation Index scores engineered at the 5-digit zip code level. What follows is a discussion of the model performances and evaluation metrics, as well as an analysis of each variable’s contribution using calculations given by scikit-learn and the Shapley additives method. Another key discussion that emerges from this research is on how social deprivation indices can be best optimized for studies that model disease and what possibilities exist for use of indices at different levels of geographic summary.

Committee Chair

Dr. Philip R.O. Payne, PhD

Committee Members

Dr. Graham Colditz, PhD Dr. Chenyang Lu, PhD Dr. Alvitta Ottley, PhD

Degree

Master of Science (MS)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Thesis

Date of Award

Spring 5-2022

Language

English (en)

DOI

https://doi.org/10.7936/c9av-hs93

Author's ORCID

https://orcid.org/

0000-0002-6449-1772

Recommended Citation

Patel, Vishesh, "Modeling Metastasis in Breast Cancer Patients Using EHR Data, the Area Deprivation Index (ADI), and Machine Learning Models" (2022). McKelvey School of Engineering Theses & Dissertations. 709.

The definitive version is available at https://doi.org/10.7936/c9av-hs93

Download

Included in

Engineering Commons

COinS

DOI

https://doi.org/10.7936/c9av-hs93

McKelvey School of Engineering Theses & Dissertations

Modeling Metastasis in Breast Cancer Patients Using EHR Data, the Area Deprivation Index (ADI), and Machine Learning Models

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Modeling Metastasis in Breast Cancer Patients Using EHR Data, the Area Deprivation Index (ADI), and Machine Learning Models

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner