reproducibilityindex.ai

Exploring Computational User Models for Agent Policy Summarization

Authors: Isaac Lage, Daphna Lifschitz, Finale Doshi-Velez, Ofra Amir

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted computational experiments to address the question of whether the model used to extract a summary needs to match the hypothesized model used by humans to reconstruct the policy in order to produce high-quality reconstructions. We conducted a human-subject study to test if the ﬁndings from our computational simulations generalize to humans, and to examine which reconstruction models people will naturally deploy.
Researcher Affiliation	Academia	Isaac Lage 1 , Daphna Lifschitz 2 , Finale Doshi-Velez1 and Ofra Amir2 1Harvard University 2Technion Israel Institute of Technology
Pseudocode	No	The paper describes its methods in prose and mathematical formulations but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper references an arXiv pre-print (Lage et al., 2019) which is the paper itself, but does not contain an explicit statement offering access to the source code for the methodology described in the paper or a direct link to a code repository.
Open Datasets	No	The paper describes the domains used (Random Gridworld, PAC-MAN, HIV Simulator) and references other papers where these systems are described or from which they are adapted. However, it does not provide a direct link, DOI, or a citation to a publicly accessible dataset for training or evaluation, nor does it use a universally recognized public dataset by name that implies immediate public access.
Dataset Splits	No	The paper describes hyperparameter tuning through "75 random restarts" and selecting summary sizes, but it does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined split citations for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper describes the computational models and algorithms used (e.g., Max-Ent, GRF model, SCOT machine teaching, active learning), but it does not list any specific software libraries or their version numbers (e.g., Python 3.x, PyTorch 1.x, NumPy x.x) that would be needed for reproducibility.
Experiment Setup	Yes	To determine the summary size k and the hyperparameters in the extraction model for each domain, we computed 75 random restarts of each hyperparameter settings reconstruction quality from a summary extracted with its matched model. We chose the smallest summary size such that increasing it does not result in changes in the best performing methods for either IL or IRL in the HIV simulator and PAC-MAN domains. In the random gridworld domain, increasing the summary size always improved IL performance, so we choose a summary size such that the best performing IRL methods did not change (HIV: 24; Gridworld: 24; PAC-MAN: 12). We report results only for the best performing methods for IL and IRL at the chosen summary size. We searched over summary sizes [12, 24, 36, 48, 60]; IL hyperparameters: kernel [RBF, polynomial], length scale [0.1, 1.] and degree [2, 3] (for polynomial kernel only); and IRL hyperparameters: trajectory lengths [1, 2, 3, 4].