Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reining Generalization in Offline Reinforcement Learning via Representation Distinction

Authors: Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms.
Researcher Affiliation Collaboration Yi Ma College of Intelligence and Computing Tianjin University EMAIL Hongyao Tang Université de Montréal Mila EMAIL Dong Li Noah s Ark Lab, Huawei Technology EMAIL Zhaopeng Meng College of Intelligence and Computing Tianjin University EMAIL
Pseudocode Yes Algorithm 1: Offline RL with RD (Psedudocode in a Py Torch-like style)
Open Source Code No The paper does not provide an explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We evaluate RD by applying it on TD3BC [7], CQL [8], SAC-N-Unc and TD3-N-Unc through a series of experiments on D4RL [47] gym Mu Jo Co-v2 and Adroit-v1 datasets
Dataset Splits No The paper mentions running experiments with five random seeds but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We report the performance at 1M gradient step for TD3BC and CQL and that at 3M gradient step for SAC-N-Unc and TD3-N-Unc on Mu Jo Co tasks. For Adroit tasks, we report results at 500K gradient step.