reproducibilityindex.ai

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Authors: Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, Peter Stone

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM s learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
Researcher Affiliation	Collaboration	1The University of Texas at Austin, 2George Mason University, 3Sony AI
Pseudocode	Yes	Algorithm 1: Causal Bisimulation Modeling (CBM)
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	To test CBM, we use two manipulation environments implemented with Robosuite (Zhu et al. 2020), shown in Fig. 5 left, and two tasks from the Deep Mind Control Suite (DMC, Tunyasuvunakool et al. (2020)).
Dataset Splits	No	The paper mentions running experiments with random seeds and evaluating on test episodes, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for the data used in the experiments.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU/CPU models, memory details).
Software Dependencies	No	The paper mentions 'Robosuite (Zhu et al. 2020)' and 'Deep Mind Control Suite (DMC, Tunyasuvunakool et al. (2020))' as environments and 'Soft Actor Critic (SAC, Haarnoja et al. (2018))' as an algorithm, but it does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	All methods are trained and evaluated with 3 random seeds. ... All methods are trained and evaluated with 5 random seeds. ... The dynamics model is pretrained in Pick and Stack tasks, and it is learned jointly with the policy from scratch in all other tasks. ... The regularization is applied to both the label and all negative samples (i.e., si t+1 {si t+1, si,n t+1}), and λ1, λ2 are the weights of the regularization terms.