"Twinscan: A Software Package for Homology-Based Gene Prediction" by Paul Flicek

All Computer Science and Engineering Research

Title

Twinscan: A Software Package for Homology-Based Gene Prediction

Authors

Paul Flicek, Washington University in St. Louis

Document Type

Technical Report

Publication Date

2003-02-14

Filename

wucse-2003-8.pdf

DOI:

10.7936/K7MP51KN

Technical Report Number

WUCSE-2003-8

Abstract

A complete mapping from genome to proteome would constitute a foundation for genome-based biology and provide targets for pharmaceutical and therapeutic intervention. This is one reason gene structure prediction has been a major subfield of computational biology for over 20 years. Many of the widely used gene prediction systems were developed in the 1990s and are unable to take advantage of the revolution in comparative genomics brought on by the sequencing of the entire genomes of an increasing numbers of vertebrates. Twinscan is a new system for high-throughput gene-structure prediction that exploits the patterns of conservation observed in alignments between a target genomic sequence and its homologous sequence in other organisms. The approach employs a symbolic conservation sequence that effectively combines many local alignments into a single global alignment. This has several important properties that make Twinscan particularly useful for high-throughput gene prediction. For mammals, Twinscan has been shown to be significantly more accurate and reliable by all measures than any non-comparative genomic method. Twinscan is based on, and includes as a component, the same hidden Markov model topology as Genscan, a popular non-homology based gene prediction program. Twinscan has an object-oriented design and is implemented in the C++ programming language. Twinscan’s three major components consist of probabilistic models of both the DNA sequence and the conservation sequence as well as a dynamic programming framework. Both the models and the computational structure are complicated aggregate classes. In this report, the design and implementation of Twinscan is described at the source-code level for the first time.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7MP51KN

Recommended Citation

Flicek, Paul, "Twinscan: A Software Package for Homology-Based Gene Prediction" Report Number: WUCSE-2003-8 (2003). All Computer Science and Engineering Research.
https://openscholarship.wustl.edu/cse_research/1126

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

All Computer Science and Engineering Research

Title

Twinscan: A Software Package for Homology-Based Gene Prediction

Authors

Document Type

Publication Date

Filename

DOI:

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

Search

Links

Browse

Author Corner

All Computer Science and Engineering Research

Title

Twinscan: A Software Package for Homology-Based Gene Prediction

Authors

Document Type

Publication Date

Filename

DOI:

Technical Report Number

Abstract

Comments

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner