reproducibilityindex.ai

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

Authors: Cassidy Laidlaw, Anca Dragan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the Boltzmann policy distribution in three settings: predicting simulated human behavior in a simple gridworld, predicting real human behavior in Overcooked, and enabling human-AI collaboration in Overcooked.
Researcher Affiliation	Academia	Cassidy Laidlaw University of California, Berkeley cassidy laidlaw@berkeley.edu Anca Dragan University of California, Berkeley anca@berkeley.edu
Pseudocode	No	The paper describes the optimization process and inference approximations verbally and with equations, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code	Yes	Our code and pretrained models are available at https://github.com/cassidylaidlaw/ boltzmann-policy-distribution.
Open Datasets	Yes	We also use the human data they collected; the train set is used for training the BC policy and the test set is used for training the human proxy policy and evaluating all predictive models.Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Grifﬁths, Sanjit A. Seshia, Pieter Abbeel, and Anca Dragan. On the Utility of Learning about Humans for Human-AI Coordination. ar Xiv:1910.05789 [cs, stat], January 2020.
Dataset Splits	No	The paper mentions "train set" and "test set" but does not specify a validation set or provide details on how the dataset was split into training, validation, and testing portions (e.g., specific percentages, sample counts, or cross-validation details).
Hardware Specification	No	The paper mentions using "RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)" for implementation, but it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper states "We implement the calculation of the BPD and all collaborative training using RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)." However, it does not provide specific version numbers for these software packages.
Experiment Setup	Yes	Here, we give further details about our experimental setup, hyperparameters, and network architectures. ... We use RLlib s PPO implementation with the hyperparameters given in Table 1.Table 2: Sequence model training hyperparameters.Table 3: Behavior cloning hyperparameters.