BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
Authors: Haohong Lin, Wenhao Ding, Jian Chen, Laixi Shi, Jiacheng Zhu, Bo Li, DING ZHAO
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL. |
| Researcher Affiliation | Academia | Haohong Lin1, Wenhao Ding1, Jian Chen1, Laixi Shi2, Jiacheng Zhu3, Bo Li4, Ding Zhao1 1CMU, 2Caltech, 3MIT, 4UChicago & UIUC |
| Pseudocode | Yes | Algorithm 1: BECAUSE Training and Planning |
| Open Source Code | Yes | Our code is available at the anonymous repo: https://anonymous.4open.science/r/BECAUSE-Neur IPS |
| Open Datasets | Yes | Lift: Object manipulation environment in Robo Suite [34]... Unlock: We designed this environment for the agent to collect a key to open doors in Minigrid [35]... Crash: Safety is critical in autonomous driving... based on highway-env [37] |
| Dataset Splits | No | The paper does not explicitly specify validation dataset splits (e.g., percentages or sample counts) or the methodology used for validation during model training or hyperparameter tuning. |
| Hardware Specification | Yes | The experiments are run on a server with 2 AMD EPYC 7542 32-Core Processor CPU, 2 NVIDIA RTX 3090 graphics and 2 NVIDIA RTX A6000 graphics, and 252 GB memory. |
| Software Dependencies | No | The paper lists various models and their hyperparameters (e.g., 'EBM', 'MLP', 'GNN') and references baseline implementations, some with licenses, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Table 12: Hyper-parameters of models used in experiments of BECAUSE and baselines (Part I) [lists Learning rate, Size of data D, Epoch per iteration, Batch size, Planning horizon H, Planning population, Reward discount, Spectral norm regularizer (λϕ, λµ), Causal discovery pthres, Encoder hiddens, EBM hidden, EBM negative buffer, EBM training steps, EBM regularizer, MLP hiddens, MLP layers, Ensemble number, Initialized mask coef, Sparsity regularizer]. Table 13: Hyper-parameters of models used in experiments of baselines (Continued). |