Capturing Failures of Large Language Models via Human Cognitive Biases

Authors: Erik Jones, Jacob Steinhardt

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that Open AI s Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.
Researcher Affiliation Academia Erik Jones UC Berkeley erjones@berkeley.edu Jacob Steinhardt UC Berkeley jsteinhardt@berkeley.edu
Pseudocode No The paper contains figures and examples of code, but no structured pseudocode or algorithm blocks are explicitly labeled or presented.
Open Source Code Yes Code for this paper is available at https://github.com/ejones313/codex-cog-biases.
Open Datasets Yes We use the Human Eval benchmark as a diverse source of normal prompts [Chen et al., 2021].
Dataset Splits No The paper uses pre-trained models and evaluates them on prompts from benchmarks like Human Eval. While Human Eval provides test cases for verifying the correctness of generated code, the paper does not specify how the prompts themselves are split into training, validation, or test sets for its own experimental methodology.
Hardware Specification No The paper states that for Codex, they use the OpenAI API, and for Code Gen, they 'run inference locally' but do not specify any details about the hardware (e.g., CPU/GPU models, memory) used for this local inference or for querying the API.
Software Dependencies No The paper mentions the use of Open AI's Codex (davinci-001) and Salesforce's Code Gen, but it does not provide specific version numbers for any ancillary software dependencies or libraries used to run the experiments locally.
Experiment Setup Yes We use the Open AI API to query the davinci-001 version of Codex, and use greedy decoding to generate solutions. We test five framing lines: raise Not Implemented Error , pass , assert False , return False , and print("Hello world!") . We consider all 12 combinations of the binary operations sum, difference, and product, with unary operations square, cube, quadruple, and square root.