reproducibilityindex.ai

Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning

Authors: Ke Jiang, Jia-Yu Yao, Xiaoyang Tan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness and feasibility of the proposed method is demonstrated with the state-of-the-art performance on the general offline RL benchmarks. Experimental results are presented in Section 5 to evaluate the effectiveness of both methods under various settings. In experiments we aim to answer: 1) Does the proposed OSR help to achieve the state-of-the-art performance in offline RL?
Researcher Affiliation	Academia	1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics 2 MIIT Key Laboratory of Pattern Analysis and Machine Intelligence 3 School of Electronic and Computer Engineering, Peking University
Pseudocode	Yes	The whole process of OSR is summarized in Algorithm 1 in Appendix C while the whole process of OSR-v is summarized as Algorithm 2 in Appendix C.
Open Source Code	Yes	Our code is available at https://github.com/Jack10843/OSR
Open Datasets	Yes	We conduct a comparative study on the Mu Jo Co and Ant Maze benchmarks in the D4RL datasets for different versions of our method
Dataset Splits	No	The paper uses datasets from D4RL but does not explicitly provide specific details on how the datasets are split into training, validation, and test sets (e.g., percentages, sample counts, or citations to specific split methodologies).
Hardware Specification	Yes	All experiments are run on a single NVIDIA 3090 GPU.
Software Dependencies	No	The paper mentions using a network structure similar to CQL and activation functions like ReLU, implying common deep learning frameworks, but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup	Yes	We use 2 hidden layers for both networks and set the dimension of hidden layer to 256. The activation functions are ReLU for both networks. We set the learning rate for actor and critic networks to 3e-4. The batch size is 256. We train the actor and critic networks for 1M iterations. The discount factor γ is set to 0.99. The noise magnitude β and the SR weighting λ is set to 0.5 and 0.1 for all tasks.