Rethinking Optimal Transport in Offline Reinforcement Learning
Authors: Arip Asadulaev, Rostislav Korst, Aleksandr Korotin, Vage Egiazarian, Andrey Filchenkov, Evgeny Burnaev
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of our algorithm on continuous control problems from the D4RL suite and demonstrate improvements over existing methods. We evaluated our method across various environments using the D4RL benchmark suite, achieving superior performance compared to state-of-the-art model-free offline RL techniques. |
| Researcher Affiliation | Collaboration | 1AIRI 2ITMO 3MIPT 4Skoltech 5Yandex 6HSE University |
| Pseudocode | Yes | Algorithm 1 Partial Policy Learning |
| Open Source Code | Yes | To reproduce our experiment we provide source code in https://github.com/machinestein/ PPL/. The code is available in supplementary materials. |
| Open Datasets | Yes | We evaluate our proposed method using the Datasets for Deep Data-Driven Reinforcement Learning (D4RL) [13] benchmark suite |
| Dataset Splits | No | The paper mentions using a dataset for training and evaluation, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts within the main text. |
| Hardware Specification | Yes | Our method converges within 2 3 hours on Nvidia 1080 (12 GB) GPU. |
| Software Dependencies | No | The code is implemented in the Py Torch [41] and JAX frameworks and will be publicly available along with the trained networks. We used Wan DB [5] for babysitting training process. (Specific version numbers for PyTorch, JAX, or WanDB are not provided). |
| Experiment Setup | Yes | For these experiments, a two-layer feed-forward network with a hidden layer size of 1024 and a learning rate of 0.001 was used with the Adam [23] optimizer. We trained the algorithm for 1M steps, with w set to 8 for all experiments. The parameters can be seen in the Table 5. |