Exploring Computational User Models for Agent Policy Summarization

Authors: Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, Ofra Amir

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted computational experiments to address the question of whether the model used to extract a summary needs to match the hypothesized model used by humans to reconstruct the policy in order to produce high-quality reconstructions. We conducted a human-subject study to test if the findings from our computational simulations generalize to humans, and to examine which reconstruction models people will naturally deploy.
Researcher Affiliation Academia Isaac Lage 1 , Daphna Lifschitz 2 , Finale Doshi-Velez1 and Ofra Amir2 1Harvard University 2Technion Israel Institute of Technology
Pseudocode No The paper describes its methods in prose and mathematical formulations but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper references an arXiv pre-print (Lage et al., 2019) which is the paper itself, but does not contain an explicit statement offering access to the source code for the methodology described in the paper or a direct link to a code repository.
Open Datasets No The paper describes the domains used (Random Gridworld, PAC-MAN, HIV Simulator) and references other papers where these systems are described or from which they are adapted. However, it does not provide a direct link, DOI, or a citation to a publicly accessible dataset for training or evaluation, nor does it use a universally recognized public dataset by name that implies immediate public access.
Dataset Splits No The paper describes hyperparameter tuning through "75 random restarts" and selecting summary sizes, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined split citations for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies No The paper describes the computational models and algorithms used (e.g., Max-Ent, GRF model, SCOT machine teaching, active learning), but it does not list any specific software libraries or their version numbers (e.g., Python 3.x, PyTorch 1.x, NumPy x.x) that would be needed for reproducibility.
Experiment Setup Yes To determine the summary size k and the hyperparameters in the extraction model for each domain, we computed 75 random restarts of each hyperparameter settings reconstruction quality from a summary extracted with its matched model. We chose the smallest summary size such that increasing it does not result in changes in the best performing methods for either IL or IRL in the HIV simulator and PAC-MAN domains. In the random gridworld domain, increasing the summary size always improved IL performance, so we choose a summary size such that the best performing IRL methods did not change (HIV: 24; Gridworld: 24; PAC-MAN: 12). We report results only for the best performing methods for IL and IRL at the chosen summary size. We searched over summary sizes [12, 24, 36, 48, 60]; IL hyperparameters: kernel [RBF, polynomial], length scale [0.1, 1.] and degree [2, 3] (for polynomial kernel only); and IRL hyperparameters: trajectory lengths [1, 2, 3, 4].