Author's School

Graduate School of Arts & Sciences

Author's Department/Program



English (en)

Date of Award

January 2010

Degree Type


Degree Name

Master of Arts (MA)

Chair and Committee

Gary Stormo


Identifying protein DNA binding sites like transcription factor binding sites is a key component of gene regulatory networks. The challenging task of predicting and identifying DNA binding sites both experimentally, and computationally suffers from high false positive due to various contributing factors, including misinterpretation of DNA binding site sequence symmetry. Our study seeks to model and compare the ability of three methods of a motif-finding program, consensus, taking into account orientation to accurately predict variable symmetric/asymmetric true binding site models. The three consensus methods included the c0 method which ignores the complement of a sequence. In this particular experiment it is given the correct binding orientation and so itΓÇÖs output reflects the true model limited only by sample size. The second method, -c2, takes into account both strands and includes them as a single strand, as a result, either orientation can be chosen as correct. The third method, c3, makes the assumption that the pattern is symmetrical and includes both orientation of each site in the model. Our results show that for a given asymmetric site, the c2 method is quite accurate in predicting the true model, while the c3 model results in a very poor true model prediction. On the other hand if the site is symmetric, the c3 method gives a very accurate model, but now the c2 model is inaccurate, predicting more information content: IC) than is actually there. The results demonstrate that either method can lead to inappropriate models if the underlying assumption of symmetry is incorrect, resulting in high levels of false positives and false negative motif predictions.


Permanent URL: