Abstract

This thesis addresses the need for a fair evaluation of language models' problem solving abilities by presenting a unified evaluation framework for ChatGPT on 16 problem solving datasets (e.g., NaturalQA, HellaSwag, MMLU, etc.). We evaluate the model's performance using F1, exact match, and quasi-exact match metrics and find that ChatGPT is highly accurate in solving tasks that require commonsense and knowledge. However, we also identify truncated text bias and few-shot scenarios as challenges that may impact ChatGPT's performance. Our research highlights the importance of standardizing datasets and developing a unified evaluation system for the fair evaluation of language models. Overall, our contributions include the development of a unified evaluation framework, the identification of performance challenges, and insights into the importance of dataset standardization for the fair evaluation of language models.

Committee Chair

Chenguang Wang, Computer Science & Engineering

Committee Members

Chien-Ju Ho, William Yeoh

Degree

Master of Science (MS)

Author's Department

Computer Science & Engineering

Author's School

McKelvey School of Engineering

Document Type

Thesis

Date of Award

Spring 5-2023

Language

English (en)

DOI

https://doi.org/10.7936/7vz0-dr08

Author's ORCID

https://orcid.org/0009-0007-0625-0499

Recommended Citation

Zeng, Fankun, "Evaluating the Problem Solving Abilities of ChatGPT" (2023). McKelvey School of Engineering Theses & Dissertations. 849.

The definitive version is available at https://doi.org/10.7936/7vz0-dr08

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.7936/7vz0-dr08

McKelvey School of Engineering Theses & Dissertations

Evaluating the Problem Solving Abilities of ChatGPT

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

DOI

Search

Links

Browse

Author Corner

McKelvey School of Engineering Theses & Dissertations

Evaluating the Problem Solving Abilities of ChatGPT

Author

Abstract

Committee Chair

Committee Members

Degree

Author's Department

Author's School

Document Type

Date of Award

Language

DOI

Author's ORCID

Recommended Citation

Included in

Share

DOI

Search

Links

Browse

Author Corner