Date of Award

Winter 12-15-2021

Author's School

Graduate School of Arts and Sciences

Author's Department

Biology & Biomedical Sciences (Computational & Systems Biology)

Degree Name

Doctor of Philosophy (PhD)

Degree Type



A cell’s identity is a function of the genes expressed in that cell, which are in turn regulated by transcription factors. Over the last decade, single-cell RNA sequencing (RNA-seq) has emerged as a powerful class of techniques to characterize cellular diversity in heterogeneous tissues. These methods barcode transcripts by their cell-of-origin and assign them to specific genes. The resulting high-dimensional data are further processed to reveal clusters of cells sharing transcriptional states. Annotating these clusters, based on either known or discovered marker genes, offers a glimpse into the dynamic composition of an organ or biological process. While single-cell RNA-seq excels at describing cell states, it alone does not inform us about the mechanisms maintaining a particular state. In recent years, multi-modal single cell technologies have flourished, combining single cell RNA-seq with at least one other genomic modality. As a result, joint assays now exist for simultaneously assaying gene expression and, respectively, genotype, methylation, chromatin accessibility, and lineage. Collectively, these methods aim to connect gene expression to regulatory processes in the genome, thereby gaining insight into the molecular foundations underpinning cellular identity. Transcription factors are key protein regulators of gene expression. Master transcription factors organize gene regulatory networks to promote differentiation or homeostasis and are often used as markers of cell type. Unfortunately, no methods exist to measure single-cell RNA-seq and map transcription factor binding in those same cells. Such a technique would be uniquely poised to identify both the identity of a cell and candidate regulatory elements contributing to that identity. The Mitra Lab has developed transposon calling cards as an alternative assay to map transcription factor binding, using transcription factor-transposase fusions to mark binding sites with deposited transposon sequences. Here, I present a single cell extension of this technique using a novel construct, the self-reporting transposon, whose genomic location can be mapped from single-cell RNA-seq libraries. Thus, in one workflow, single cell calling cards identifies cell types in complex systems and deconvolves cell-type-specific regulatory elements bound by a transcription factor in those cell types. The remainder of this dissertation is organized as follows. Chapter 1 reviews the biological and technological context for this work, with particular focus on single-cell RNA-seq techniques and methods to assay transcription factor binding sites. Chapter 2 presents the central advancement of this dissertation, the self-reporting transposon and its use in single cell calling cards to map cell-type-specific transcription factor binding sites in complex systems. Chapter 3 discusses the qBED track, a medium for visualizing calling cards data, and its accompanying data format for storing results. Chapter 4 examines the Bayesian blocks algorithm, a method adopted from the astrophysics community, and employs it to call peaks in calling cards data. Chapter 5 explores a new use for self-reporting transposons as surveyors of chromosomal compartmentalization. Chapter 6 concludes this dissertation, offering suggestions for future work and positing a broader role for self-reporting transposons in genomics.


English (en)

Chair and Committee

Robi D. Mitra

Committee Members

Donald F. Conrad

Included in

Genetics Commons