Phylogenetic Illumination of Escherichia coli by Single Nucleotide Polymorphism Interrogation

Author's School

Graduate School of Arts & Sciences

Author's Department/Program

Biology and Biomedical Sciences: Molecular Microbiology and Microbial Pathogenesis


English (en)

Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Chair and Committee

Phillip I. Tarr


Escherichia coli is a diverse species whose members include both commensals and several groups of pathogens. The virulence phenotype of each of these groups depends on their host environment and phylogeny. Many efforts in the past several decades to characterize this genospecies phylogenetically have ranged from enzyme electrophoretic mobilities to, more recently, sequence-based analyses such as multi-locus sequence typing and whole genome sequencing. Despite increasing quantities of taxa and data analyzed, conflicting topologies are produced. I attempted to resolve these ambiguities by interrogating extended sets of likely stable E. coli chromosomal sequence to try to determine a single tree. Regions with overt evidence of recombination: such as pathogenicity islands) were excluded. To do this, sequence from four 25-26 kb contiguous conserved backbone segments were obtained from four different quadrants of the chromosome in 36 strains representing the five major groups of E. coli: A, B1, D, B2, E). Four different phylogenetic methods were used to build phylograms. Despite efforts to exclude subsegments that had been acquired by recombination, genetic transfer was found to play a very large role in the phylogenetic structure of E. coli and hindered generation of a reliable dendrogram. Specifically, individual 25-26 kb segments within the same strain set had independent topologies suggesting differing evolutionary histories within a single chromosome. Inter-group recombination occurred non-randomly, with certain group pairings more likely to share DNA than others.

In parallel, we studied the evolution of a tightly defined group of E. coli, the EHEC 1 clade. This group of organisms provides an instructive example for the study of the emergence of a pathogen, Shiga toxin-producing E. coli O157, from its less virulent ancestor, E. coli O55:H7. Single nucleotide polymorphisms: SNPs) illuminate the evolutionary histories and relatedness of organisms, and SNPs found in stable genome regions can provide a precise measurement of evolution. Each SNP in all backbone open reading frames was identified in five newly and two previously sequenced evolutionarily instructive pathogenic E. coli O157:H7, O157:H-, and O55:H7. SNPs between the sequenced reads and the reference genome were characterized as synonymous or nonsynonymous and described as either radial: descendent from a cluster founder within the same cluster) or linear: linearly-acquired and the clade formed) designation. The 1,113 synonymous SNPs measure emergence of the oldest cluster of this pathogen approximately 7,000 years ago. A surprisingly high number of shared SNPs within defined clusters suggest restricted survival and limited effective population sizes of pathogenic O157:H7, tenuous survival of these organisms in nature, source-sink evolutionary dynamics, or, possibly, a limited number of mutations that confer selective advantage. A single large segment spanning the rfb-gnd gene cluster is the only backbone region convincingly acquired by recombination as O157 emerged from O55. The data confirm the current O157 emergence model and delineate the range and rate of descent and diversification of a pathogen within a well-delineated clade of the E. coli genospecies.


Permanent URL:

Leopold_Table_S3.xls (1367 kB)
Table S3

Leopold_Table_S7.xls (1982 kB)
Table S7

This document is currently not available here.