“I Don’t Think So”: Summarizing Policy Disagreements for Agent Comparison
Authors: Yotam Amitai, Ofra Amir5269-5276
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted user studies to assess the usefulness of disagreementbased summaries for identifying superior agents and conveying agent differences. Results show disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS, a strategy summarization algorithm which generates summaries for each agent independently. |
| Researcher Affiliation | Academia | Yotam Amitai, Ofra Amir Faculty of Industrial Engineering & Management, Technion Israel Institute of Technology yotama@campus.technion.ac.il ,oamir@technion.ac.il |
| Pseudocode | Yes | Algorithm 1: The DISAGREEMENTS algorithm. |
| Open Source Code | No | Frogger python implementation. https://github.com/pedrodbs/frogger. Accessed: 2021-02-01. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highwayenv. Accessed: 2021-02-01. |
| Open Datasets | Yes | To evaluate our algorithm we generated summaries of agents playing the game of Frogger (Sequeira and Gubert 2020) and controlling a vehicle in a highway environment (Leurent 2018). Frogger python implementation. https://github.com/pedrodbs/frogger. Accessed: 2021-02-01. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highwayenv. Accessed: 2021-02-01. |
| Dataset Splits | No | Frogger Agents. We made use of the framework developed by Sequeira and Gervasio (2020) to test the DISAGREEMENTS algorithm on multiple configurable agents of varying capabilities. Three different agents were trained using standard Q-learning (Watkins and Dayan 1992)... All highway agents were trained for 2000 episodes using double DQN architecture (Hasselt 2010) |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments. |
| Software Dependencies | No | Frogger python implementation... All highway agents were trained for 2000 episodes using double DQN architecture (Hasselt 2010) |
| Experiment Setup | Yes | All summaries were composed of five trajectories made up of sequential states, ten for Frogger and twenty for Highway. These contained the important state at the center of the trajectory, with half the states preceding and the rest succeeding it. Table 1: Parameters for Frogger & Highway domains. Parameter Description F H k Summary budget, i.e. number of trajectories l Length of each trajectory 10 20 h Number of states following s to include in the trajectory num Sim The number of simulations (episodes) run by the DISAGREEMENTS algorithm overlap Lim Maximal number of shared states allowed between two trajectories in the summary imp Meth Importance method used for evaluating disagreements |