Abstract

Traditional machine learning paradigms often rely on a single global model trained on an entire dataset, aiming for broad generalization across all instances. However, in many real-world applications, the underlying data distribution is heterogeneous, and meaningful predictions often require models that focus on specific subpopulations rather than treating the data as a whole. This motivates the study of learning from conditional distributions, a framework where predictive models are designed to capture the structure and properties of restricted subsets of the data, leading to improved accuracy, fairness, and interpretability. This dissertation explores three key subproblems that exemplify different aspects of learning from conditional distributions: fairness auditing, conditional linear classification, and personalized prediction. These problems share a common goal of modeling and verifying properties of subpopulations within a broader distribution but present distinct computational challenges. First, we examine fairness auditing, which assesses whether a classifier exhibits statistical parity across certain subgroups. This can be seen as verifying whether the performance of a model remains stable when conditioned on specific subsets of the data. We develop an efficient auditing framework using a reduction to agnostic learning under distributional assumptions. More importantly, we establish distribution-specific lower bound, for auditing halfspace subgroups by reducing from the Learning With Errors problem, demonstrating that verifying fairness over the family of general halfspace is computationally intractable even under nice distributions, such as standard normal ones. Next, we investigate conditional linear classification, where the objective is to learn a classifier that minimizes classification error within a well-defined subgroup. Unlike fairness auditing, which focuses on post-hoc verification of properties, conditional classification actively seeks an optimal classifier-subgroup pair. We present efficient approximation algorithms for this problem under well-behaved and standard normal distributional settings. However, we also discovered nontrivial distribution-specific lower bounds for conditional linear classification by reduction from agnostic linear classification, which turns out to be computationally intractable even under Gaussian distributions. Building on conditional classification, we extend our study to personalized prediction, which seeks to optimize predictions for an individual rather than a predefined subgroup. The key distinction is that in personalized prediction, the subgroup or reference class must always contain the reference point we aim to classify. This introduces an additional constraint but also highlights a natural extension of conditional learning. We show that with modifications to our learning algorithms, we can efficiently construct reference classes that satisfy this requirement, bridging the gap between conditional classification and truly individualized predictions. Together, our analysis of the three problems establish a rich theoretical foundation for learning from conditional distributions, which should give the audience a good overview of this domain. In particular, our findings provide both algorithmic insights and computational lower bounds, illustrating the trade-offs between computational hardness and the expressive power of the subgroup classes. By leveraging techniques from gradient descent and cryptographic hardness assumptions, this dissertation deepens our understanding of conditional learning and its potential for practical applications in fairness auditing, interpretable classification, and personalized decision-making.

Committee Chair

Brendan Juba

Committee Members

Brendan Juba; Chien-Ju Ho; Daniel Hsu; William Yeoh; Yevgeniy Vorobeychik

Degree

Doctor of Philosophy (PhD)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Dissertation

Date of Award

7-30-2025

Language

English (en)

DOI

https://doi.org/10.7936/ygx7-xy70

Recommended Citation

huang, jizhou, "Learning From Conditional Data Distributions" (2025). McKelvey School of Engineering Theses & Dissertations. 1266.

The definitive version is available at https://doi.org/10.7936/ygx7-xy70

Download

Included in

Computer Sciences Commons

COinS

DOI

https://doi.org/10.7936/ygx7-xy70

McKelvey School of Engineering Theses & Dissertations

Learning From Conditional Data Distributions

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Learning From Conditional Data Distributions

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner