Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics
Authors: Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation across six benchmark Mu Jo Co tasks with variable parameters demonstrates that DORA not only achieves a more precise dynamics encoding but also significantly outperforms existing baselines in terms of performance. 5. Experiments In this section, we conduct the experiments to answer the following questions: |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Polixir Technologies. |
| Pseudocode | Yes | To sum up, the pseudocodes of training and testing are illustrated in Algorithm 1 and Appendix B, respectively. |
| Open Source Code | Yes | To sum up, the pseudocodes of training and testing are illustrated in Algorithm 1 and Appendix B, respectively. We release the code at Github2. 2https://github.com/Xinyuz26/DORA |
| Open Datasets | Yes | We choose Mu Jo Co tasks for experiments, including Half Cheetah-v3, Walker2d-v3, Hopper-v3, and Inverted Double Pendulum-v2, which are common benchmarks in offline RL (Todorov et al., 2012). |
| Dataset Splits | No | Each environment contains 10 tasks for training and 10 tasks for testing for both IID, OOD, and non-stationary dynamics. No explicit mention of a separate validation split or dataset. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions using a GRU network and a linear layer for parameterization in Section C.5, but it does not specify any software names with version numbers for libraries like PyTorch, TensorFlow, or specific Python versions, which are crucial for reproducibility. |
| Experiment Setup | Yes | The paper includes Table 4 'Configurations and hyper-parameters used in offline encoder training', which details specific values such as 'Debias loss weight', 'Distortion loss weight', 'History length', 'Latent space dim', 'Batch size', 'Learning rate', 'Training steps', and 'Radius of radius basis function' for various environments. |