Robot Policy Learning with Temporal Optimal Transport Reward
Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. |
| Researcher Affiliation | Collaboration | Yuwei Fu1 Haichao Zhang2 Di Wu1 Wei Xu2 Benoit Boulet1 1Mc Gill University 2Horizon Robotics yuwei.fu@mail.mcgill.ca haichao.zhang@horizon.cc |
| Pseudocode | Yes | The pseudo-code of Temporal OT is summarized in Algorithm 1. |
| Open Source Code | Yes | Our code is available at: https://github.com/fuyw/Temporal OT. |
| Open Datasets | Yes | Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. and We implement Temporal OT-RL in Py Torch [42] based on the official ADS implementation1. and Meta-world [57]. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information for training, validation, and testing. It describes evaluation procedures (evaluating RL agent performance) but not data partitioning for dataset-based validation. |
| Hardware Specification | Yes | We run our experiments on a workstation with an NVIDIA Ge Force RTX 3090 GPU and a 12th Gen Intel(R) Core(TM) i9-12900KF CPU. |
| Software Dependencies | Yes | For the other main softwares, we use the following package versions: 1. Python 3.9.19 2. numpy 1.26.4 3. torch 2.2.2 4. torchvision 0.17.2 5. pot 0.9.3 6. dm-control 1.0.17 7. dm_env 1.6 8. mujoco-py 2.1.2.14 9. cython 3.0.0a10 10. gym 0.22.0 |
| Experiment Setup | Yes | Table 4: Summarization of hyper-parameters. Parameter Value Total environment step 1e6 Adam learning rate 1e-4 Batch size 512 Target network τ 0.005 Discount factor γ 0.9 Expert demo number NE 2 Context length kc 3 Mask window size km 10 Dr Q buffer size 1.5e5 Dr Q action repeat 2 Dr Q frame stack 3 Dr Q image size (84, 84, 3) Dr Q embedding dimension 50 Dr Q CNN features (32, 32, 32, 32) Dr Q CNN kernels (3, 3, 3, 3) Dr Q CNN strides (2, 1, 1, 1) Dr Q CNN padding VALID Dr Q actor network (1024, 1024, 1024) Dr Q critic network (1024, 1024, 1024) |