reproducibilityindex.ai

Turning large language models into cognitive models

Authors: Marcel Binz, Eric Schulz

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that after finetuning them on data from psychological experiments these models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains. In addition, we show that their representations contain the information necessary to model behavior on the level of individual subjects. Finally, we demonstrate that finetuning on multiple tasks enables large language models to predict human behavior in a previously unseen task. Taken together, these results suggest that large, pre-trained models can be adapted to become models of human cognition, which opens up future research directions toward building more general cognitive models.
Researcher Affiliation	Academia	Marcel Binz Max Planck Institute for Biological Cybernetics T ubingen, Germany marcel.binz@tue.mpg.de Eric Schulz Max Planck Institute for Biological Cybernetics T ubingen, Germany eric.schulz@tue.mpg.de
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Data and code for our study are available through the following Git Hub repository: https://github.com/marcelbinz/CENTa UR.
Open Datasets	Yes	In the decisions from descriptions setting, we used the choices13k data set (Peterson et al., 2021), which is a large-scale data set... In the decisions from experience setting, we used data from the horizon task (Wilson et al., 2014) and a replication study (Feng et al., 2021), which combined include 60 participants making a total of 67,200 choices.
Dataset Splits	Yes	In each fold, we split the data into a training set (90%), a validation set (9%), and a test set (1%).
Hardware Specification	No	The paper does not specify any hardware components (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The optimization procedure was implemented in PYTORCH (Paszke et al., 2019) and used the default LBFGS optimizer (Liu & Nocedal, 1989). [...] We extracted BERT embeddings using the transformers library (Wolf et al., 2019). While software is mentioned, specific version numbers for PyTorch or the transformers library are not provided.
Experiment Setup	Yes	We fitted separate regularized logistic regression models on the standardized data via a maximum likelihood estimation. [...] The validation set was used to identify the parameter α that controls the strength of the ℓ2 regularization term using a grid search procedure. We considered discrete α-values of [0, 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0].