reproducibilityindex.ai

Rethinking Optimal Transport in Offline Reinforcement Learning

Authors: Arip Asadulaev, Rostislav Korst, Aleksandr Korotin, Vage Egiazarian, Andrey Filchenkov, Evgeny Burnaev

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of our algorithm on continuous control problems from the D4RL suite and demonstrate improvements over existing methods. We evaluated our method across various environments using the D4RL benchmark suite, achieving superior performance compared to state-of-the-art model-free offline RL techniques.
Researcher Affiliation	Collaboration	1AIRI 2ITMO 3MIPT 4Skoltech 5Yandex 6HSE University
Pseudocode	Yes	Algorithm 1 Partial Policy Learning
Open Source Code	Yes	To reproduce our experiment we provide source code in https://github.com/machinestein/ PPL/. The code is available in supplementary materials.
Open Datasets	Yes	We evaluate our proposed method using the Datasets for Deep Data-Driven Reinforcement Learning (D4RL) [13] benchmark suite
Dataset Splits	No	The paper mentions using a dataset for training and evaluation, but it does not explicitly specify the training, validation, and test dataset splits with percentages or sample counts within the main text.
Hardware Specification	Yes	Our method converges within 2 3 hours on Nvidia 1080 (12 GB) GPU.
Software Dependencies	No	The code is implemented in the Py Torch [41] and JAX frameworks and will be publicly available along with the trained networks. We used Wan DB [5] for babysitting training process. (Specific version numbers for PyTorch, JAX, or WanDB are not provided).
Experiment Setup	Yes	For these experiments, a two-layer feed-forward network with a hidden layer size of 1024 and a learning rate of 0.001 was used with the Adam [23] optimizer. We trained the algorithm for 1M steps, with w set to 8 for all experiments. The parameters can be seen in the Table 5.