Date of Award
Master of Arts (AM/MA)
The study population comprised children, adolescents, and adults who were residents of the city of St. Louis at the time of data collection in 2015. The data collected includes sex, age, race, measured height and weight, self-reported height and weight, zip code, educational background, exercise and diet habits, and descriptions and strategies of participants' weight (i.e. overweight and trying to lose weight, respectively). I use the C5.0 algorithm to create classification trees and rule-based models to analyze this population. Specifically, I model a binary self-image variable as a function of sex, age, race, zip code, and a ratio of reported versus measured BMI (body mass index), and a multi-level categorical weight description variable as a function of sex, age, race, zip code, BMI ratio, and weight strategy. I compare the performance of the C5.0 algorithm with and without rules and boosting for independent and grouped categories, for both the binary and multi-level outcome. This comparison is limited due to sample size constraints. Ultimately, C5.0 performed best when modeling the binary variable and using either rules or boosting for independent categories.
Chair and Committee
Todd Kuffner, Han Gan
Shirali, Rohan, "Classification Trees and Rule-Based Modeling Using the C5.0 Algorithm for Self-Image Across Sex and Race in St. Louis" (2016). Arts & Sciences Electronic Theses and Dissertations. 718.