Document Type

Technical Report

Publication Date

2005-04-15

Filename

WUCSE-2005-12.pdf

DOI:

10.7936/K7CJ8BVJ

Technical Report Number

WUCSE-2005-12

Abstract

Steganography, or information hiding, is to conceal the existence of messages so as to protect their confidentiality. We consider de-ciphering a stegoscript, a text with secret messages embedded within a covertext, and identifying the vocabularies used in the mes-sages, with no knowledge of the vocabularies and grammar in which the script was writ-ten. Our research was motivated by the prob-lem of identifying conserved non-coding func-tional elements (motifs) in regulatory regions of genome sequences, which we view as stego-scripts constructed by nature with a statis-tical model consisting of a dictionary and a grammar. We develop an iterative learning algorithm, WordSpy, to learn such a model from a stegoscript. The model then can be applied to identify the embedded secret mes-sages, i.e., the functional motifs. Our algo-rithm can successfully recover the most pos-sible text of the first ten chapters of a novel embedded in a stegoscript and identify the transcription factor binding motifs in the up-stream regions of ∼ 800 yeast genes.

Comments

Permanent URL: http://dx.doi.org/10.7936/K7CJ8BVJ

Share

COinS