Abstract
The study population comprised children, adolescents, and adults who were residents of the city of St. Louis at the time of data collection in 2015. The data collected includes sex, age, race, measured height and weight, self-reported height and weight, zip code, educational background, exercise and diet habits, and descriptions and strategies of participants' weight (i.e. overweight and trying to lose weight, respectively). I use the C5.0 algorithm to create classification trees and rule-based models to analyze this population. Specifically, I model a binary self-image variable as a function of sex, age, race, zip code, and a ratio of reported versus measured BMI (body mass index), and a multi-level categorical weight description variable as a function of sex, age, race, zip code, BMI ratio, and weight strategy. I compare the performance of the C5.0 algorithm with and without rules and boosting for independent and grouped categories, for both the binary and multi-level outcome. This comparison is limited due to sample size constraints. Ultimately, C5.0 performed best when modeling the binary variable and using either rules or boosting for independent categories.
Committee Chair
David Wright
Committee Members
Todd Kuffner, Han Gan
Degree
Master of Arts (AM/MA)
Author's Department
Mathematics
Document Type
Thesis
Date of Award
Spring 5-2016
Language
English (en)
DOI
https://doi.org/10.7936/K7RX99DW
Recommended Citation
Shirali, Rohan, "Classification Trees and Rule-Based Modeling Using the C5.0 Algorithm for Self-Image Across Sex and Race in St. Louis" (2016). Arts & Sciences Theses and Dissertations. 718.
The definitive version is available at https://doi.org/10.7936/K7RX99DW
Comments
Permanent URL: https://doi.org/10.7936/K7RX99DW