Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning
Authors: Ke Jiang, Jia-Yu Yao, Xiaoyang Tan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and feasibility of the proposed method is demonstrated with the state-of-the-art performance on the general offline RL benchmarks. Experimental results are presented in Section 5 to evaluate the effectiveness of both methods under various settings. In experiments we aim to answer: 1) Does the proposed OSR help to achieve the state-of-the-art performance in offline RL? |
| Researcher Affiliation | Academia | 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics 2 MIIT Key Laboratory of Pattern Analysis and Machine Intelligence 3 School of Electronic and Computer Engineering, Peking University |
| Pseudocode | Yes | The whole process of OSR is summarized in Algorithm 1 in Appendix C while the whole process of OSR-v is summarized as Algorithm 2 in Appendix C. |
| Open Source Code | Yes | Our code is available at https://github.com/Jack10843/OSR |
| Open Datasets | Yes | We conduct a comparative study on the Mu Jo Co and Ant Maze benchmarks in the D4RL datasets for different versions of our method |
| Dataset Splits | No | The paper uses datasets from D4RL but does not explicitly provide specific details on how the datasets are split into training, validation, and test sets (e.g., percentages, sample counts, or citations to specific split methodologies). |
| Hardware Specification | Yes | All experiments are run on a single NVIDIA 3090 GPU. |
| Software Dependencies | No | The paper mentions using a network structure similar to CQL and activation functions like ReLU, implying common deep learning frameworks, but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | We use 2 hidden layers for both networks and set the dimension of hidden layer to 256. The activation functions are ReLU for both networks. We set the learning rate for actor and critic networks to 3e-4. The batch size is 256. We train the actor and critic networks for 1M iterations. The discount factor γ is set to 0.99. The noise magnitude β and the SR weighting λ is set to 0.5 and 0.1 for all tasks. |