ORCID

https://orcid.org/0009-0007-0625-0499

Date of Award

Spring 5-2023

Author's School

McKelvey School of Engineering

Author's Department

Computer Science & Engineering

Degree Name

Master of Science (MS)

Degree Type

Thesis

Abstract

This thesis addresses the need for a fair evaluation of language models' problem solving abilities by presenting a unified evaluation framework for ChatGPT on 16 problem solving datasets (e.g., NaturalQA, HellaSwag, MMLU, etc.). We evaluate the model's performance using F1, exact match, and quasi-exact match metrics and find that ChatGPT is highly accurate in solving tasks that require commonsense and knowledge. However, we also identify truncated text bias and few-shot scenarios as challenges that may impact ChatGPT's performance. Our research highlights the importance of standardizing datasets and developing a unified evaluation system for the fair evaluation of language models. Overall, our contributions include the development of a unified evaluation framework, the identification of performance challenges, and insights into the importance of dataset standardization for the fair evaluation of language models.

Language

English (en)

Chair

Chenguang Wang, Computer Science & Engineering

Committee Members

Chien-Ju Ho, William Yeoh

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.7936/7vz0-dr08

McKelvey School of Engineering Theses & Dissertations

Evaluating the Problem Solving Abilities of ChatGPT

ORCID

Date of Award

Author's School

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Evaluating the Problem Solving Abilities of ChatGPT

Author

ORCID

Date of Award

Author's School

Author's Department

Degree Name

Degree Type

Abstract

Language

Chair

Committee Members

Included in

Share

DOI

Search

Links

Browse

Author Corner