State Abstraction as Compression in Apprenticeship Learning

Authors: David Abel, Dilip Arumugam, Kavosh Asadi, Yuu Jinnai, Michael L. Littman, Lawson L.S. Wong3134-3142

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to showcase the relationship between compression and performance captured by the algorithm in a traditional grid world and present an extension to high-dimensional observations via experiments with the Atari game Breakout. 4 Experiments We next describe two experiments that illustrate the power of DIBS for constructing abstractions that trade-off compression and value.
Researcher Affiliation Academia David Abel,1 Dilip Arumugam,2 Kavosh Asadi,1 Yuu Jinnai,1 Michael L. Littman,1 Lawson L.S. Wong3 1Department of Computer Science, Brown University 2Department of Computer Science, Stanford University 3College of Computer and Information Science, Northeastern University
Pseudocode Yes Algorithm 1 DIBS INPUT: πE, ρE, M, β, , iters OUTPUT: φ, πφ
Open Source Code Yes Our code is made freely available for reproduction and extension.1 1github.com/david-abel/rl info theory
Open Datasets Yes First, we study the traditional Four Rooms domain introduced by Sutton, Precup, and Singh (1999). Second, we present a simple extension to SIBS that scales to high-dimensional observation spaces and evaluate this extension in the Atari game Breakout (Bellemare et al. 2013).
Dataset Splits No The paper does not provide specific percentages or sample counts for training, validation, or test dataset splits. For Breakout, it mentions '100 evaluation episodes' but not a formal data split.
Hardware Specification No The paper does not specify any hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions using VAEs, A2C, and the Adam optimizer but does not provide specific version numbers for these or any other software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The agent interacts with an 11 × 11 grid with walls dividing the world into four connected rooms. The agent has four actions, up, left, down, and right. Each action moves the agent in the specified direction with probability 0.9 (unless it hits a wall), and orthogonally with probability 0.05. The agent starts in the bottom left corner, and receives +1 reward for transitioning into the top right state, which is terminal. All other transitions receive 0 reward. We set γ to 0.99. The model is trained via Equation 19 for 2000 episodes using the Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0001.