Date of Award
Bachelor of Arts (A.B.)
Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation protocol with homomorphic encryption, assuming semi-honest parties, for computation of the expected loss between the buyer and the seller without compromising the data. With experimental results, I show the promise of this approach and also current limitations in real-life feasibility.
Joo, Minsung, "Dataset Evaluation for Data Trading Using Expected Loss and Homomorphic Encryption" (2022). Senior Honors Papers / Undergraduate Theses. 47.