Reining Generalization in Offline Reinforcement Learning via Representation Distinction
Authors: Yi Ma, Hongyao Tang, Dong Li, Zhaopeng Meng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms. |
| Researcher Affiliation | Collaboration | Yi Ma College of Intelligence and Computing Tianjin University mayi@tju.edu.cn Hongyao Tang Université de Montréal Mila tang.hongyao@mila.quebec Dong Li Noah s Ark Lab, Huawei Technology dongleecsu@gmail.com Zhaopeng Meng College of Intelligence and Computing Tianjin University mengzp@tju.edu.cn |
| Pseudocode | Yes | Algorithm 1: Offline RL with RD (Psedudocode in a Py Torch-like style) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate RD by applying it on TD3BC [7], CQL [8], SAC-N-Unc and TD3-N-Unc through a series of experiments on D4RL [47] gym Mu Jo Co-v2 and Adroit-v1 datasets |
| Dataset Splits | No | The paper mentions running experiments with five random seeds but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We report the performance at 1M gradient step for TD3BC and CQL and that at 3M gradient step for SAC-N-Unc and TD3-N-Unc on Mu Jo Co tasks. For Adroit tasks, we report results at 500K gradient step. |