Modeling Boundedly Rational Agents with Latent Inference Budgets
Authors: Athul Paul Jacob, Abhishek Gupta, Jacob Andreas
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In three modeling tasks inferring navigation goals from routes, inferring communicative intents from human utterances, and predicting next moves in human chess games we show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty. |
| Researcher Affiliation | Academia | Athul Paul Jacob MIT apjacob@mit.edu Abhishek Gupta University of Washington abhgupta@cs.washington.edu Jacob Andreas MIT jda@mit.edu |
| Pseudocode | No | The paper describes algorithms but does not include structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper mentions using and implementing with various libraries (PyTorch, Numpy, Huggingface, Ray, mazelib, pettingzoo), but does not explicitly state that the source code for their methodology is released or provide a link to it. |
| Open Datasets | Yes | For this task, we use the data collected by Monroe et al. (2017). ... We use similar data to previous models of human chess play (Mc Ilroy-Young et al., 2020): First, a dataset Dlarge containing roughly 6 million moves... second, a dataset Dsmall containing roughly 75,000 moves... These data points were randomly sampled from the January, 2019 database release of a chess website (lichess). |
| Dataset Splits | Yes | The dataset consists of 46,994 rounds across 948 games. We create a 80/10/10 split across train, valid and test sets. ...Dlarge containing roughly 6 million moves in the training split, 60,968 in the validation split and 60,969 moves in the test set. ...Dsmall containing roughly 50,000 moves in the training split, 12,041 moves in the validation split and 12,040 moves in the test split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running its experiments. It only mentions general terms like "on a GPU" implicitly. |
| Software Dependencies | No | The paper mentions software libraries and frameworks such as PyTorch, Numpy, T5 model, BERT, Huggingface, Ray, mazelib, and pettingzoo, along with their respective publication years in citations. However, it does not provide specific version numbers of these software dependencies (e.g., PyTorch 1.9, Numpy 1.20) as required for reproducibility. |
| Experiment Setup | Yes | All models in Section 4 were trained using the Adam optimizer (Kingma & Ba, 2015), where the learning rates were sweeped across the following values [1.0, 0.5, 1e 1, 0.05, 1e 2, 5e 3, 1e 3, 5e 4, 1e 4, 5e 5] for 50 epochs. ... The speaker was trained with a batch size of 64 using the Adam optimizer with learning rate 1e 4 for 25 epochs. ... The listener models were trained using Adam and the learning rates were sweeped across the following values [1e 3, 5e 4, 1e 4, 5e 5] for upto 50 epochs. ... The policy and value network was trained using Adam with a learning rate of 0.001, a batch size of 4096 and for upto 30 epochs. |