Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
Authors: Zizhao Wang, Caroline Wang, Xuesu Xiao, Yuke Zhu, Peter Stone
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM s learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin, 2George Mason University, 3Sony AI |
| Pseudocode | Yes | Algorithm 1: Causal Bisimulation Modeling (CBM) |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | To test CBM, we use two manipulation environments implemented with Robosuite (Zhu et al. 2020), shown in Fig. 5 left, and two tasks from the Deep Mind Control Suite (DMC, Tunyasuvunakool et al. (2020)). |
| Dataset Splits | No | The paper mentions running experiments with random seeds and evaluating on test episodes, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for the data used in the experiments. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU/CPU models, memory details). |
| Software Dependencies | No | The paper mentions 'Robosuite (Zhu et al. 2020)' and 'Deep Mind Control Suite (DMC, Tunyasuvunakool et al. (2020))' as environments and 'Soft Actor Critic (SAC, Haarnoja et al. (2018))' as an algorithm, but it does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | All methods are trained and evaluated with 3 random seeds. ... All methods are trained and evaluated with 5 random seeds. ... The dynamics model is pretrained in Pick and Stack tasks, and it is learned jointly with the policy from scratch in all other tasks. ... The regularization is applied to both the label and all negative samples (i.e., si t+1 {si t+1, si,n t+1}), and λ1, λ2 are the weights of the regularization terms. |