Date of Award

Spring 5-2022

Author's School

College of Arts & Sciences

Author's Program

Mathematics

Degree Name

Bachelor of Arts (A.B.)

Restricted/Unrestricted

Unrestricted

Abstract

Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation protocol with homomorphic encryption, assuming semi-honest parties, for computation of the expected loss between the buyer and the seller without compromising the data. With experimental results, I show the promise of this approach and also current limitations in real-life feasibility.

Mentor

Netanel Raviv

Share

COinS