Date of Award

Spring 5-2016

Author's School

Graduate School of Arts and Sciences

Author's Department


Degree Name

Master of Arts (AM/MA)

Degree Type



The study population comprised children, adolescents, and adults who were residents of the city of St. Louis at the time of data collection in 2015. The data collected includes sex, age, race, measured height and weight, self-reported height and weight, zip code, educational background, exercise and diet habits, and descriptions and strategies of participants' weight (i.e. overweight and trying to lose weight, respectively). I use the C5.0 algorithm to create classification trees and rule-based models to analyze this population. Specifically, I model a binary self-image variable as a function of sex, age, race, zip code, and a ratio of reported versus measured BMI (body mass index), and a multi-level categorical weight description variable as a function of sex, age, race, zip code, BMI ratio, and weight strategy. I compare the performance of the C5.0 algorithm with and without rules and boosting for independent and grouped categories, for both the binary and multi-level outcome. This comparison is limited due to sample size constraints. Ultimately, C5.0 performed best when modeling the binary variable and using either rules or boosting for independent categories.


English (en)

Chair and Committee

David Wright

Committee Members

Todd Kuffner, Han Gan


Permanent URL: