Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reining Generalization in Offline Reinforcement Learning via Representation Distinction
Authors: Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms. |
| Researcher Affiliation | Collaboration | Yi Ma College of Intelligence and Computing Tianjin University EMAIL Hongyao Tang Université de Montréal Mila EMAIL Dong Li Noah s Ark Lab, Huawei Technology EMAIL Zhaopeng Meng College of Intelligence and Computing Tianjin University EMAIL |
| Pseudocode | Yes | Algorithm 1: Offline RL with RD (Psedudocode in a Py Torch-like style) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate RD by applying it on TD3BC [7], CQL [8], SAC-N-Unc and TD3-N-Unc through a series of experiments on D4RL [47] gym Mu Jo Co-v2 and Adroit-v1 datasets |
| Dataset Splits | No | The paper mentions running experiments with five random seeds but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We report the performance at 1M gradient step for TD3BC and CQL and that at 3M gradient step for SAC-N-Unc and TD3-N-Unc on Mu Jo Co tasks. For Adroit tasks, we report results at 500K gradient step. |