reproducibilityindex.ai

Robot Policy Learning with Temporal Optimal Transport Reward

Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method.
Researcher Affiliation	Collaboration	Yuwei Fu1 Haichao Zhang2 Di Wu1 Wei Xu2 Benoit Boulet1 1Mc Gill University 2Horizon Robotics yuwei.fu@mail.mcgill.ca haichao.zhang@horizon.cc
Pseudocode	Yes	The pseudo-code of Temporal OT is summarized in Algorithm 1.
Open Source Code	Yes	Our code is available at: https://github.com/fuyw/Temporal OT.
Open Datasets	Yes	Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. and We implement Temporal OT-RL in Py Torch [42] based on the official ADS implementation1. and Meta-world [57].
Dataset Splits	No	The paper does not explicitly provide specific dataset split information for training, validation, and testing. It describes evaluation procedures (evaluating RL agent performance) but not data partitioning for dataset-based validation.
Hardware Specification	Yes	We run our experiments on a workstation with an NVIDIA Ge Force RTX 3090 GPU and a 12th Gen Intel(R) Core(TM) i9-12900KF CPU.
Software Dependencies	Yes	For the other main softwares, we use the following package versions: 1. Python 3.9.19 2. numpy 1.26.4 3. torch 2.2.2 4. torchvision 0.17.2 5. pot 0.9.3 6. dm-control 1.0.17 7. dm_env 1.6 8. mujoco-py 2.1.2.14 9. cython 3.0.0a10 10. gym 0.22.0
Experiment Setup	Yes	Table 4: Summarization of hyper-parameters. Parameter Value Total environment step 1e6 Adam learning rate 1e-4 Batch size 512 Target network τ 0.005 Discount factor γ 0.9 Expert demo number NE 2 Context length kc 3 Mask window size km 10 Dr Q buffer size 1.5e5 Dr Q action repeat 2 Dr Q frame stack 3 Dr Q image size (84, 84, 3) Dr Q embedding dimension 50 Dr Q CNN features (32, 32, 32, 32) Dr Q CNN kernels (3, 3, 3, 3) Dr Q CNN strides (2, 1, 1, 1) Dr Q CNN padding VALID Dr Q actor network (1024, 1024, 1024) Dr Q critic network (1024, 1024, 1024)