Date of Award

Spring 5-15-2018

Author's School

Graduate School of Arts and Sciences

Author's Department


Degree Name

Doctor of Philosophy (PhD)

Degree Type



This dissertation concentrates on applying machine learning methods to economic policy analysis. When talking about using machine learning or other non-behavioral model to conduct policy analysis, the first question raised by economists is the Lucas critique. A policy intervention would affect the incentive that people face and thus changes the underlying decision-making problem. A predictive model without the component of optimizing behavior might not capture people’s reactions to the policy intervention to give a reliable prediction. Even if the quantitative effect of the Lucas critique is not significant, the machine learning method might have no advantage over a well-performed standard econometric model in terms of prediction or time efficiency. The first chapter presents an out-of-sample prediction comparison between major machine learning models and the structural econometric model. To evaluate the benefits of this approach, I use the most common machine learning algorithms, CART, C4.5, LASSO, random forest, and adaboost, to construct prediction models for a cash transfer experiment conducted by the Progresa program in Mexico, and I compare the prediction results with those of a previous structural econometric study. Two prediction tasks are performed in this paper: the out-of-sample forecast and the long-term within-sample simulation. For the out-of-sample forecast, both the mean absolute error and the root mean square error of the school attendance rates found by all machine learning models are smaller than those found by the structural model. Random forest and adaboost have the highest accuracy for the individual outcomes of all subgroups. For the long-term within-sample simulation, the structural model has better performance than do all of the machine learning models. The poor within-sample fitness of the machine learning model results from the inaccuracy of the income and pregnancy prediction models. The result shows that the machine learning model performs better than does the structural model when there are many data to learn; however, when the data are limited, the structural model offers a more sensible prediction. In addition to prediction outcome, machine learning models are more time-efficient than the structural model. The most complicated model, random forest, takes less than half an hour to build and less than one minute to predict. The findings show promise for adopting machine learning in economic policy analyses in the era of big data.

The second chapter exploits the predictive power of machine learning algorithms to conduct covariate adjustment for estimating average treatment effects and the log-odds ratio. Previous semi-parametric approaches have proven that baseline covariate adjustment can increase the estimator efficiency and statistical power, compared to an unadjusted estimator. I use random forest model to select predictive covariates and conduct a Monte Carlo simulation to compare the efficiency and statistical power of unadjusted, OLS-based, and random-forest-based approaches in different parameter settings. The simulation result indicates that the random-forest-based estimator is more efficient and has higher statistical power than the other two methods. In addition, I apply this approach to the Zomba Cash Transfer Experiment in Malawi to study the difference in policy effect between conditional and unconditional cash transfers.

The third chapter investigates the possibility of using machine learning models to conduct the counterfactual analysis for conditional policies. Conditional Cash Transfer has become a popular tool to alleviate intergenerational poverty in many developing countries due to the success of the Progresa program in Mexico. There are some experiments focused on the implementation details to explore the efficient practice of the policy implementation. The policy analysis, however, still heavily relies on the counterfactual prediction because of the budget and time constraints. Recently, machine learning has been proved successful in many prediction applications. Adopting machine learning model into economic policy analysis might help to increase the prediction performance and hence offer another approach of counterfactual analysis. While it is straightforward to apply machine learning algorithms to conduct counterfactual prediction for the unconditional policy, there is no direct prediction for the conditional policy due to the lack of behavioral description. This chapter uses the Zomba Cash Transfer Experiment in Malawi to examine the error of using an unconditional machine learning approach to prediction the outcome of the conditional policy. The result shows that the error from the conditional-unconditional difference is a minor source of prediction errors, which provides support of exploiting the predictive power of machine learning algorithms to offer policy suggestions for the conditional policy.


English (en)

Chair and Committee

Werner Ploberger

Committee Members

Siddhartha Chib, Sanmay Das, Ian Fillmore, Bruce Petersen,


Permanent URL:

Available for download on Sunday, May 15, 2118

Included in

Economics Commons