Date of Award

Winter 12-15-2019

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Doctor of Philosophy (PhD)

Degree Type



Messaging and the use of language is important to entities involved in politics in the United States. For example, members of Congress reach communicate with their electorates through social networks, and mass media like traditional and online journalism can affect audiences by expressing ideological bias in published articles. At the same time, as information technologies mature, these messages also become useful data sources for the analysis of problems related to the political arena. In this dissertation, we investigate four problems in United States politics with machine learning and natural language processing techniques based on political corpora.We first test the political ideology generalization performance of machine learning models across different domains, showing that it is surprisingly difficult to generalize from one domain (e.g. congressional speeches) to another (e.g. articles in mass media). We find that, instead of the limitations of domain adaptation techniques, it is conceptual differences behind different data sources that lead to the unsatisfactory performance. However, we also show that, for certain topics, concepts as expressed in different domains are more consistent, and cross-domain partisanship generalization performance is better than for other topics. We also observe language flows from Congressional speeches to media. Secondly, we measure ideological intensities for members of Congress across different channels. We find that, ideological intensities expressed by the same member of Congress are different across these channels. Next, we examine whether mayors, as local politicians in the United States, are becoming more "national" by comparing their social media communications with members of Congress. We run topic analysis based on their Twitter posts. Our results show that most mayors in the United States still focus on local issues instead of national level affairs. We also observe that mayors in cities with larger populations are more similar to members of Congress in general in terms of their use of language on Twitter. Moreover, those mayors whose social media posts are similar with the members of Congress are also more likely to express their partisanship more explicitly through language. Finally we investigate the messaging trends among current members of Congress based on their Twitter data. We examine two potential factors that could drive retweet activity: leadership positions and diversity of Twitter post topics. Our results show that those representatives who are in core leadership positions are central nodes in the retweet network of legislators. For representatives not in positions of leadership, we observe a positive correlation between their centrality and Twitter post topic diversity. We take advantage of a "natural experiment", the departure of Paul Ryan from Congress at the end of 2018, and present evidence that this does not substantially change retweet patterns among other members, even when their prior retweet was largely intermediated by Ryan. Taken together, these results indicate that central positions in messaging networks are based on extrinsic factors such as leadership positions instead of intrinsic factors or personal influence.


English (en)


Sanmay Das

Committee Members

Roman Garnett, Chien-Ju Ho, Christopher Lucas, Betsy Sinclair,


Permanent URL:

Available for download on Saturday, September 13, 2121