Working Memory Capacity of ChatGPT: An Empirical Study
Authors: Dongyu Gong, Xingchen Wan, Dingmin Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we systematically assess the working memory capacity of Chat GPT, a large language model developed by Open AI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that Chat GPT has a working memory capacity limit strikingly similar to that of humans. |
| Researcher Affiliation | Academia | Dongyu Gong1,2, Xingchen Wan1, Dingmin Wang1 1University of Oxford 2Yale University dongyu.gong@yale.edu, xwan@robots.ox.ac.uk, dingmin.wang@cs.ox.ac.uk |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | All code for our experiments can be accessed in this repository: https://github.com/Daniel-Gong/Chat GPT-WM. |
| Open Datasets | No | The paper describes generating data for experiments ('we generated 50 blocks of letter sequences'), but does not provide access information (link, DOI, etc.) to a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits. It describes generating data for '50 blocks of tests' for each experiment, but this refers to experimental runs, not a partitioned dataset. |
| Hardware Specification | No | The paper states using APIs for LLMs ('prompted Chat GPT (using the Open AI API, model = gpt-3.5-turbo , temperature = 1, other parameters are set to default values) to complete the tasks...'), meaning the authors did not specify their own hardware used for computation. |
| Software Dependencies | No | The paper mentions LLM models used (e.g., 'gpt-3.5-turbo', 'Bloomz-7B'), but does not provide specific version numbers for ancillary software like programming languages, libraries, or frameworks (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | Yes | We devised two categories of n-back tasks involving verbal and spatial working memory... and prompted Chat GPT (using the Open AI API, model = gpt-3.5-turbo , temperature = 1, other parameters are set to default values) to complete the tasks in a trial-by-trial manner. For both categories, we have a base version task and several variants derived from the base version further to test the model s performance under different conditions. For n = {1, 2, 3}, respectively, we generated 50 blocks of letter sequences... Each block contained a sequence of 24 letters, which are presented one at a time as user input to the API. We included 8 match trials and 16 nonmatch trials in each block. |