Structured World Belief for Reinforcement Learning in POMDP
Authors: Gautam Singh, Skand Peri, Junghyun Kim, Hyunseok Kim, Sungjin Ahn
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that object-centric belief provides a more accurate and robust performance for filtering and generation. Furthermore, we show the efficacy of structured world belief in improving the performance of reinforcement learning, planning and supervised reasoning. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Rutgers University 2Electronics and Telecommunications Research Institute 3Rutgers Center for Cognitive Science. |
| Pseudocode | Yes | Algorithm 1 File-Slot Matching and Glimpse Proposal |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of its methodology. |
| Open Datasets | Yes | 2D Maze Game. Matt Chan TK. gym-maze. https://github.com/ Matt Chan TK/gym-maze, 2020. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly describe validation dataset splits. For example, it states 'we trained our model on 2D Branching Sprites with upto 2 objects but we test in a setting in which up to 4 objects can spawn', indicating a train/test split, but no specific validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions various models and algorithms like AESMC, A2C, and SPACE, but does not provide specific version numbers for any software libraries or dependencies used in the implementation. |
| Experiment Setup | Yes | We evaluate our model SWB with K = 10 particles which provides both object-centric representation and belief states. [...] The SWB world models for 2D Branching Sprites, 3D Food Chase game and 2D Maze Game were pre-trained using frames collected through 200K, 200K and 500K interactions with the environment, respectively, using a random policy. |