Modeling Boundedly Rational Agents with Latent Inference Budgets

Authors: Athul Paul Jacob, Abhishek Gupta, Jacob Andreas

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In three modeling tasks inferring navigation goals from routes, inferring communicative intents from human utterances, and predicting next moves in human chess games we show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
Researcher Affiliation Academia Athul Paul Jacob MIT apjacob@mit.edu Abhishek Gupta University of Washington abhgupta@cs.washington.edu Jacob Andreas MIT jda@mit.edu
Pseudocode No The paper describes algorithms but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper mentions using and implementing with various libraries (PyTorch, Numpy, Huggingface, Ray, mazelib, pettingzoo), but does not explicitly state that the source code for their methodology is released or provide a link to it.
Open Datasets Yes For this task, we use the data collected by Monroe et al. (2017). ... We use similar data to previous models of human chess play (Mc Ilroy-Young et al., 2020): First, a dataset Dlarge containing roughly 6 million moves... second, a dataset Dsmall containing roughly 75,000 moves... These data points were randomly sampled from the January, 2019 database release of a chess website (lichess).
Dataset Splits Yes The dataset consists of 46,994 rounds across 948 games. We create a 80/10/10 split across train, valid and test sets. ...Dlarge containing roughly 6 million moves in the training split, 60,968 in the validation split and 60,969 moves in the test set. ...Dsmall containing roughly 50,000 moves in the training split, 12,041 moves in the validation split and 12,040 moves in the test split.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running its experiments. It only mentions general terms like "on a GPU" implicitly.
Software Dependencies No The paper mentions software libraries and frameworks such as PyTorch, Numpy, T5 model, BERT, Huggingface, Ray, mazelib, and pettingzoo, along with their respective publication years in citations. However, it does not provide specific version numbers of these software dependencies (e.g., PyTorch 1.9, Numpy 1.20) as required for reproducibility.
Experiment Setup Yes All models in Section 4 were trained using the Adam optimizer (Kingma & Ba, 2015), where the learning rates were sweeped across the following values [1.0, 0.5, 1e 1, 0.05, 1e 2, 5e 3, 1e 3, 5e 4, 1e 4, 5e 5] for 50 epochs. ... The speaker was trained with a batch size of 64 using the Adam optimizer with learning rate 1e 4 for 25 epochs. ... The listener models were trained using Adam and the learning rates were sweeped across the following values [1e 3, 5e 4, 1e 4, 5e 5] for upto 50 epochs. ... The policy and value network was trained using Adam with a learning rate of 0.001, a batch size of 4096 and for upto 30 epochs.