The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

Authors: Cassidy Laidlaw, Anca Dragan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the Boltzmann policy distribution in three settings: predicting simulated human behavior in a simple gridworld, predicting real human behavior in Overcooked, and enabling human-AI collaboration in Overcooked.
Researcher Affiliation Academia Cassidy Laidlaw University of California, Berkeley cassidy laidlaw@berkeley.edu Anca Dragan University of California, Berkeley anca@berkeley.edu
Pseudocode No The paper describes the optimization process and inference approximations verbally and with equations, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes Our code and pretrained models are available at https://github.com/cassidylaidlaw/ boltzmann-policy-distribution.
Open Datasets Yes We also use the human data they collected; the train set is used for training the BC policy and the test set is used for training the human proxy policy and evaluating all predictive models.Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, and Anca Dragan. On the Utility of Learning about Humans for Human-AI Coordination. ar Xiv:1910.05789 [cs, stat], January 2020.
Dataset Splits No The paper mentions "train set" and "test set" but does not specify a validation set or provide details on how the dataset was split into training, validation, and testing portions (e.g., specific percentages, sample counts, or cross-validation details).
Hardware Specification No The paper mentions using "RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)" for implementation, but it does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper states "We implement the calculation of the BPD and all collaborative training using RLlib (Liang et al., 2018) and Py Torch (Paszke et al., 2019)." However, it does not provide specific version numbers for these software packages.
Experiment Setup Yes Here, we give further details about our experimental setup, hyperparameters, and network architectures. ... We use RLlib s PPO implementation with the hyperparameters given in Table 1.Table 2: Sequence model training hyperparameters.Table 3: Behavior cloning hyperparameters.