Rethinking Optimal Transport in Offline Reinforcement Learning

Authors: Arip Asadulaev, Rostislav Korst, Aleksandr Korotin, Vage Egiazarian, Andrey Filchenkov, Evgeny Burnaev

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our algorithm on continuous control problems from the D4RL suite and demonstrate improvements over existing methods. We evaluated our method across various environments using the D4RL benchmark suite, achieving superior performance compared to state-of-the-art model-free offline RL techniques.
Researcher Affiliation Collaboration 1AIRI 2ITMO 3MIPT 4Skoltech 5Yandex 6HSE University
Pseudocode Yes Algorithm 1 Partial Policy Learning
Open Source Code Yes To reproduce our experiment we provide source code in https://github.com/machinestein/ PPL/. The code is available in supplementary materials.
Open Datasets Yes We evaluate our proposed method using the Datasets for Deep Data-Driven Reinforcement Learning (D4RL) [13] benchmark suite
Dataset Splits No The paper mentions using a dataset for training and evaluation, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts within the main text.
Hardware Specification Yes Our method converges within 2 3 hours on Nvidia 1080 (12 GB) GPU.
Software Dependencies No The code is implemented in the Py Torch [41] and JAX frameworks and will be publicly available along with the trained networks. We used Wan DB [5] for babysitting training process. (Specific version numbers for PyTorch, JAX, or WanDB are not provided).
Experiment Setup Yes For these experiments, a two-layer feed-forward network with a hidden layer size of 1024 and a learning rate of 0.001 was used with the Adam [23] optimizer. We trained the algorithm for 1M steps, with w set to 8 for all experiments. The parameters can be seen in the Table 5.