Robot Policy Learning with Temporal Optimal Transport Reward

Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method.
Researcher Affiliation Collaboration Yuwei Fu1 Haichao Zhang2 Di Wu1 Wei Xu2 Benoit Boulet1 1Mc Gill University 2Horizon Robotics yuwei.fu@mail.mcgill.ca haichao.zhang@horizon.cc
Pseudocode Yes The pseudo-code of Temporal OT is summarized in Algorithm 1.
Open Source Code Yes Our code is available at: https://github.com/fuyw/Temporal OT.
Open Datasets Yes Extensive experiments on the Meta-world benchmark tasks validate the efficacy of the proposed method. and We implement Temporal OT-RL in Py Torch [42] based on the official ADS implementation1. and Meta-world [57].
Dataset Splits No The paper does not explicitly provide specific dataset split information for training, validation, and testing. It describes evaluation procedures (evaluating RL agent performance) but not data partitioning for dataset-based validation.
Hardware Specification Yes We run our experiments on a workstation with an NVIDIA Ge Force RTX 3090 GPU and a 12th Gen Intel(R) Core(TM) i9-12900KF CPU.
Software Dependencies Yes For the other main softwares, we use the following package versions: 1. Python 3.9.19 2. numpy 1.26.4 3. torch 2.2.2 4. torchvision 0.17.2 5. pot 0.9.3 6. dm-control 1.0.17 7. dm_env 1.6 8. mujoco-py 2.1.2.14 9. cython 3.0.0a10 10. gym 0.22.0
Experiment Setup Yes Table 4: Summarization of hyper-parameters. Parameter Value Total environment step 1e6 Adam learning rate 1e-4 Batch size 512 Target network τ 0.005 Discount factor γ 0.9 Expert demo number NE 2 Context length kc 3 Mask window size km 10 Dr Q buffer size 1.5e5 Dr Q action repeat 2 Dr Q frame stack 3 Dr Q image size (84, 84, 3) Dr Q embedding dimension 50 Dr Q CNN features (32, 32, 32, 32) Dr Q CNN kernels (3, 3, 3, 3) Dr Q CNN strides (2, 1, 1, 1) Dr Q CNN padding VALID Dr Q actor network (1024, 1024, 1024) Dr Q critic network (1024, 1024, 1024)