reproducibilityindex.ai

Reining Generalization in Offline Reinforcement Learning via Representation Distinction

Authors: Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms.
Researcher Affiliation	Collaboration	Yi Ma College of Intelligence and Computing Tianjin University mayi@tju.edu.cn Hongyao Tang Université de Montréal Mila tang.hongyao@mila.quebec Dong Li Noah s Ark Lab, Huawei Technology dongleecsu@gmail.com Zhaopeng Meng College of Intelligence and Computing Tianjin University mengzp@tju.edu.cn
Pseudocode	Yes	Algorithm 1: Offline RL with RD (Psedudocode in a Py Torch-like style)
Open Source Code	No	The paper does not provide an explicit statement about releasing open-source code or a link to a code repository.
Open Datasets	Yes	We evaluate RD by applying it on TD3BC [7], CQL [8], SAC-N-Unc and TD3-N-Unc through a series of experiments on D4RL [47] gym Mu Jo Co-v2 and Adroit-v1 datasets
Dataset Splits	No	The paper mentions running experiments with five random seeds but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We report the performance at 1M gradient step for TD3BC and CQL and that at 3M gradient step for SAC-N-Unc and TD3-N-Unc on Mu Jo Co tasks. For Adroit tasks, we report results at 500K gradient step.