Structured World Belief for Reinforcement Learning in POMDP

Authors: Gautam Singh, Skand Peri, Junghyun Kim, Hyunseok Kim, Sungjin Ahn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that object-centric belief provides a more accurate and robust performance for filtering and generation. Furthermore, we show the efficacy of structured world belief in improving the performance of reinforcement learning, planning and supervised reasoning.
Researcher Affiliation Academia 1Department of Computer Science, Rutgers University 2Electronics and Telecommunications Research Institute 3Rutgers Center for Cognitive Science.
Pseudocode Yes Algorithm 1 File-Slot Matching and Glimpse Proposal
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of its methodology.
Open Datasets Yes 2D Maze Game. Matt Chan TK. gym-maze. https://github.com/ Matt Chan TK/gym-maze, 2020.
Dataset Splits No The paper mentions training and testing but does not explicitly describe validation dataset splits. For example, it states 'we trained our model on 2D Branching Sprites with upto 2 objects but we test in a setting in which up to 4 objects can spawn', indicating a train/test split, but no specific validation split.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions various models and algorithms like AESMC, A2C, and SPACE, but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup Yes We evaluate our model SWB with K = 10 particles which provides both object-centric representation and belief states. [...] The SWB world models for 2D Branching Sprites, 3D Food Chase game and 2D Maze Game were pre-trained using frames collected through 200K, 200K and 500K interactions with the environment, respectively, using a random policy.