Abstract

Structural variants (SVs) significantly contribute to genetic diversity among individuals and impact gene regulation, disease, and evolution. The predominant approach to profiling these genetic variants in large-scale population studies is through short-read whole-genome sequencing (WGS). This approach involves sequencing millions of short reads, aligning them to a linear reference genome, and subsequently identifying variants relative to this reference. However, its capability to detect SVs, especially long insertions and complex rearrangements, is limited. The primary limitation arises from the insufficient representation of structural diversity across human populations in the linear reference genome. As a result, short reads derived from non-reference SVs may either remain unaligned or be erroneously aligned to the reference genome. The advent of long-read sequencing technologies has enabled the creation of haplotype-resolved assemblies, revealing a vast number of previously unknown SVs. Nevertheless, the prohibitive cost of these technologies constrains their broad application in research. In response to this constraint, we collaborated with the Human Pangenome Reference Consortium (HPRC) to construct a first draft of the human pangenome reference by integrating haplotype-resolved assemblies from 44 genetically diverse individuals. Using this comprehensive reference, we developed a tool that leverages read depth from short-read WGS data to improve the detection of copy number variants across individuals.

Committee Chair

Ira Hall

Degree

Doctor of Philosophy (PhD)

Author's Department

Biology & Biomedical Sciences (Molecular Genetics & Genomics)

Author's School

Graduate School of Arts and Sciences

Document Type

Dissertation

Date of Award

12-14-2023

Language

English (en)

Author's ORCID

https://orcid.org/0000-0001-8183-213X

Included in

Biology Commons

Share

COinS