reproducibilityindex.ai

Structured World Belief for Reinforcement Learning in POMDP

Authors: Gautam Singh, Skand Peri, Junghyun Kim, Hyunseok Kim, Sungjin Ahn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that object-centric belief provides a more accurate and robust performance for ﬁltering and generation. Furthermore, we show the efﬁcacy of structured world belief in improving the performance of reinforcement learning, planning and supervised reasoning.
Researcher Affiliation	Academia	1Department of Computer Science, Rutgers University 2Electronics and Telecommunications Research Institute 3Rutgers Center for Cognitive Science.
Pseudocode	Yes	Algorithm 1 File-Slot Matching and Glimpse Proposal
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of its methodology.
Open Datasets	Yes	2D Maze Game. Matt Chan TK. gym-maze. https://github.com/ Matt Chan TK/gym-maze, 2020.
Dataset Splits	No	The paper mentions training and testing but does not explicitly describe validation dataset splits. For example, it states 'we trained our model on 2D Branching Sprites with upto 2 objects but we test in a setting in which up to 4 objects can spawn', indicating a train/test split, but no specific validation split.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions various models and algorithms like AESMC, A2C, and SPACE, but does not provide specific version numbers for any software libraries or dependencies used in the implementation.
Experiment Setup	Yes	We evaluate our model SWB with K = 10 particles which provides both object-centric representation and belief states. [...] The SWB world models for 2D Branching Sprites, 3D Food Chase game and 2D Maze Game were pre-trained using frames collected through 200K, 200K and 500K interactions with the environment, respectively, using a random policy.