Technical Report Number
This thesis describes an unsupervised system to learn natural language morphology, specifically suffix identification from unannotated text. The system is language independent, so that is can learn the morphology of any human language. For English this means identifying “-s”, “-ing”, “-ed”, “-tion” and many other suffixes, in addition to learning which stems they attach to. The system uses no prior knowledge, such as part of speech tags, and learns the morphology by simply reading in a body of unannotated text. The system consists of a generative probabilistic model which is used to evaluate hypotheses, and a directed search and a hill-climbing search which are used in conjunction to find a highly probably hypothesis. Experiments applying the system to English and Polish are described.
Snover, Matthew G., "An Unsupervised Knowledge Free Algorithm for the Learning of Morphology in Natural Languages - Master's Thesis, May 2002" Report Number: WUCSE-2002-18 (2002). All Computer Science and Engineering Research.