Abstract
This dissertation studies statistical inference for network-linked data, with a focus on how network structure affects estimation and uncertainty quantification in modern econometric and statistical problems. Across many applications, networks enter the analysis in at least two distinct ways: first, through network-derived covariates constructed from observed graphs; and second, through dependence structures in which the underlying outcomes themselves propagate along the network. This dissertation develops inferential tools for both settings under weak and realistic assumptions. The first part considers regression problems in which node-level covariates include noisy network summary statistics, such as local subgraph frequencies and spectral embeddings. Motivated by the fact that latent variables governing network formation make classical regression assumptions difficult to justify, I develop an assumption-lean framework for linear regression with jointly exchangeable regression arrays. Within this framework, I study projection-based inferential targets, establish asymptotic normality and bootstrap consistency, and analyze the additional bias introduced when estimated network statistics are used as regressors. In particular, for regression with local count statistics, I show that ordinary least squares can suffer from a distinctive finite-sample and asymptotic bias, and I develop bias-corrected procedures that target more natural parameters under weaker sparsity conditions. I further develop resampling-based tools, including bootstrap and downsampling methods, to accommodate more challenging regimes and broader classes of network covariates. The second part moves beyond network-linked covariates and studies settings in which dependence is intrinsic to the data-generating process. I consider a network spatial autoregression model in which outcomes depends on its neighborhood directly through the observed graph, creating stronger and more global dependence than in the regression setting above. For this model, I establish a central limit theorem under weak conditions by analyzing the dependence structure through powers of network operators and walk-based representations. This part also highlights some fundamental challenges for inference for this network-dependent setting. Taken together, the dissertation provides a unified perspective on statistical inference for network-linked data, consider a particular assumption-lean regression task with noisy network covariates to more general generating models with intrinsic network dependence. The results clarify how network structure alters inferential targets, generates novel bias phenomena, and requires new tools for valid large-sample inference.
Committee Chair
Robert Lunde
Committee Members
Muriah Wheelock; Ran Chen; Soumendra Lahiri; Xiaofeng Shao
Degree
Doctor of Philosophy (PhD)
Author's Department
Statistics
Document Type
Dissertation
Date of Award
4-28-2026
Language
English (en)
DOI
https://doi.org/10.7936/mzmc-6c27
Recommended Citation
Li, Wei, "Assumption-lean Inference for Network-Linked Data" (2026). Arts & Sciences Graduate Student Theses and Dissertations. 3815.
The definitive version is available at https://doi.org/10.7936/mzmc-6c27