ORCID

https://orcid.org/0000-0001-8183-213X

Date of Award

12-14-2023

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Molecular Genetics & Genomics)

Degree Name

Doctor of Philosophy (PhD)

Degree Type

Dissertation

Abstract

Structural variants (SVs) significantly contribute to genetic diversity among individuals and impact gene regulation, disease, and evolution. The predominant approach to profiling these genetic variants in large-scale population studies is through short-read whole-genome sequencing (WGS). This approach involves sequencing millions of short reads, aligning them to a linear reference genome, and subsequently identifying variants relative to this reference. However, its capability to detect SVs, especially long insertions and complex rearrangements, is limited. The primary limitation arises from the insufficient representation of structural diversity across human populations in the linear reference genome. As a result, short reads derived from non-reference SVs may either remain unaligned or be erroneously aligned to the reference genome. The advent of long-read sequencing technologies has enabled the creation of haplotype-resolved assemblies, revealing a vast number of previously unknown SVs. Nevertheless, the prohibitive cost of these technologies constrains their broad application in research. In response to this constraint, we collaborated with the Human Pangenome Reference Consortium (HPRC) to construct a first draft of the human pangenome reference by integrating haplotype-resolved assemblies from 44 genetically diverse individuals. Using this comprehensive reference, we developed a tool that leverages read depth from short-read WGS data to improve the detection of copy number variants across individuals.

Language

English (en)

Chair and Committee

Ira Hall

Available for download on Thursday, August 28, 2025

Included in

Genetics Commons

Share

COinS