All Computer Science and Engineering Research

Computational Detection of CpG Islands in DNA

Eric C. Rouchka, Washington University in St Louis
Richard Mazzarella, Washington University in St. Louis
David J. States, Washington University in St. Louis

Document Type

Technical Report

Department

Computer Science and Engineering

Publication Date

1997-01-01

Filename

WUCS-97-39.PDF

Technical Report Number

WUCS-97-39

Abstract

Regions of DNA rich in CpG dinucleotides, also known as CpG islands, are often located upstream of the transcription start side in both tissue specific and housekeeping genes. Overall, CPG dinucleotides are observed at a density of 25% the expected level from base composition alone, partially due to 5-methylcytosine decay (Bird, 1993). Since CpG dinucleotides typically occur with low frequency, CpG islands can be distinguished statistically in the genome. Our method of detecting CpG islands involves a heuristic algorithm employing classic changepoint methods and log-likelihood statistics. A Java applet has been created to allow for user interaction and visualization of the segmentation resulting from the changepoint analysis. The model is tested using several sequences obtainable from GenBank (NCBI, 1997), including a 220 Kb fragment of human X chromosome from the filanin (FLM) gene to the glucose-6-phosphate dehydrogenase (G6PD) gene which has been experimentally studied (Rivella, et. al., 1995; E.Y. Chen, et. all., 1996). Preliminary results suggest a breakpoint segmentation that is consistent with observable manual analysis. About 56% of human genes have associated CpG rich islands (Antequera and Bird, 1993). By identifying the CpG islands, it is thought that regions of DNA coding for housekeeping or tissue-specific genes can be located (Antequera and Bird, 1993) even in the absence of transcriptional activity. Biological experiments searching for such genes can then be narrowed given the locations of the CpG islands.

Comments

Permanent URL: http://dx.doi.org/10.7936/K73R0R4N

Recommended Citation

Rouchka, Eric C.; Mazzarella, Richard; and States, David J., "Computational Detection of CpG Islands in DNA" Report Number: WUCS-97-39 (1997). All Computer Science and Engineering Research.
https://openscholarship.wustl.edu/cse_research/451

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

DOI

https://doi.org/10.7936/K73R0R4N

All Computer Science and Engineering Research

Computational Detection of CpG Islands in DNA

Document Type

Department

Publication Date

Filename

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

All Computer Science and Engineering Research

Computational Detection of CpG Islands in DNA

Authors

Document Type

Department

Publication Date

Filename

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner