McKelvey School of Engineering Theses & Dissertations

Machine Learning in Complex Scientific Domains: Hospitalization Records, Drug Interactions, Predictive Modeling and Fairness for Class Imbalanced Data

Arghya Datta, Washington University in St. LouisFollow

ORCID

http://orcid.org/0000-0003-3543-3829

Date of Award

Summer 8-15-2021

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Machine learning has demonstrated potential in analyzing large, complex datasets and has become ubiquitous across many fields of scientific research. As machine learning is actively deployed in many complex and critical domains, it is essential for machine learning to engage with domain expertise to aid in knowledge discovery as well as address challenges in predictive modeling in complex domains. Domain expertise represents an essential and elaborate collection of knowledge that is often under-utilized when applying machine learning in complex domains. In this dissertation, I have addressed existing challenges regarding knowledge discovery in complex domains via engagement with domain expertise, particularly in the context of medicine and healthcare, as well as developing neural network-based algorithms that improve predictive modeling in challenging scenarios such as class imbalance and under-representation. First, a domain expertise guided machine learning framework has been developed that is capable of identifying potential interventions for clinical outcomes. Domain experts were looped in the model building process to mitigate the pitfalls of confounding and data-leaking variables. Second, a simple machine learning approach has been presented to study drug-drug interactions that lead to adverse events of clinical significance. Identifying unknown interventions for clinical outcomes and adverse drug interactions lead to novel knowledge discovery in a complex domain such as in medicine and healthcare. Third, the problems of class imbalance and under-representation have been studied. Novel neural network architectures have been presented that simultaneously improve classification and calibration performances across under-represented sub-populations in class imbalanced datasets.

Language

English (en)

Chair

Sanjay Joshua Swamidass

Committee Members

Jeremy Buhler, Chien-Ju Ho, Fuhai Li, Philip Payne,

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.7936/c0tv-kt51

McKelvey School of Engineering Theses & Dissertations

Machine Learning in Complex Scientific Domains: Hospitalization Records, Drug Interactions, Predictive Modeling and Fairness for Class Imbalanced Data

ORCID

Date of Award

Author's School

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Machine Learning in Complex Scientific Domains: Hospitalization Records, Drug Interactions, Predictive Modeling and Fairness for Class Imbalanced Data

Author

ORCID

Date of Award

Author's School

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Included in

Share

DOI

Search

Links

Browse

Author Corner