reproducibilityindex.ai

“I Don’t Think So”: Summarizing Policy Disagreements for Agent Comparison

Authors: Yotam Amitai, Ofra Amir5269-5276

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted user studies to assess the usefulness of disagreementbased summaries for identifying superior agents and conveying agent differences. Results show disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS, a strategy summarization algorithm which generates summaries for each agent independently.
Researcher Affiliation	Academia	Yotam Amitai, Ofra Amir Faculty of Industrial Engineering & Management, Technion Israel Institute of Technology yotama@campus.technion.ac.il ,oamir@technion.ac.il
Pseudocode	Yes	Algorithm 1: The DISAGREEMENTS algorithm.
Open Source Code	No	Frogger python implementation. https://github.com/pedrodbs/frogger. Accessed: 2021-02-01. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highwayenv. Accessed: 2021-02-01.
Open Datasets	Yes	To evaluate our algorithm we generated summaries of agents playing the game of Frogger (Sequeira and Gubert 2020) and controlling a vehicle in a highway environment (Leurent 2018). Frogger python implementation. https://github.com/pedrodbs/frogger. Accessed: 2021-02-01. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highwayenv. Accessed: 2021-02-01.
Dataset Splits	No	Frogger Agents. We made use of the framework developed by Sequeira and Gervasio (2020) to test the DISAGREEMENTS algorithm on multiple configurable agents of varying capabilities. Three different agents were trained using standard Q-learning (Watkins and Dayan 1992)... All highway agents were trained for 2000 episodes using double DQN architecture (Hasselt 2010)
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments.
Software Dependencies	No	Frogger python implementation... All highway agents were trained for 2000 episodes using double DQN architecture (Hasselt 2010)
Experiment Setup	Yes	All summaries were composed of five trajectories made up of sequential states, ten for Frogger and twenty for Highway. These contained the important state at the center of the trajectory, with half the states preceding and the rest succeeding it. Table 1: Parameters for Frogger & Highway domains. Parameter Description F H k Summary budget, i.e. number of trajectories l Length of each trajectory 10 20 h Number of states following s to include in the trajectory num Sim The number of simulations (episodes) run by the DISAGREEMENTS algorithm overlap Lim Maximal number of shared states allowed between two trajectories in the summary imp Meth Importance method used for evaluating disagreements