Reining Generalization in Offline Reinforcement Learning via Representation Distinction

Authors: Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms.
Researcher Affiliation Collaboration Yi Ma College of Intelligence and Computing Tianjin University mayi@tju.edu.cn Hongyao Tang Université de Montréal Mila tang.hongyao@mila.quebec Dong Li Noah s Ark Lab, Huawei Technology dongleecsu@gmail.com Zhaopeng Meng College of Intelligence and Computing Tianjin University mengzp@tju.edu.cn
Pseudocode Yes Algorithm 1: Offline RL with RD (Psedudocode in a Py Torch-like style)
Open Source Code No The paper does not provide an explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We evaluate RD by applying it on TD3BC [7], CQL [8], SAC-N-Unc and TD3-N-Unc through a series of experiments on D4RL [47] gym Mu Jo Co-v2 and Adroit-v1 datasets
Dataset Splits No The paper mentions running experiments with five random seeds but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We report the performance at 1M gradient step for TD3BC and CQL and that at 3M gradient step for SAC-N-Unc and TD3-N-Unc on Mu Jo Co tasks. For Adroit tasks, we report results at 500K gradient step.