Mildly Conservative Q-Learning for Offline Reinforcement Learning
Authors: Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the D4RL benchmarks demonstrate that MCQ achieves remarkable performance compared with prior work. |
| Researcher Affiliation | Academia | Jiafei Lyu1 , Xiaoteng Ma2 , Xiu Li1 , Zongqing Lu3 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Department of Automation, Tsinghua Unversity 3School of Computer Science, Peking University |
| Pseudocode | Yes | Algorithm 1 Mildly Conservative Q-learning (MCQ) |
| Open Source Code | Yes | Our code is publicly available at https://github.com/dmksjfl/MCQ. |
| Open Datasets | Yes | Experimental results on the D4RL Mu Jo Co locomotion tasks demonstrate that MCQ surpasses recent strong baseline methods on most of the tasks, especially on non-expert datasets. [...] We conduct experiments on Mu Jo Co locomotion tasks, which are made up of five types of datasets (random, medium, medium-replay, medium-expert, and expert), yielding a total of 15 datasets. We use the most recently released '-v2' datasets for performance evaluation. |
| Dataset Splits | No | The paper states it uses D4RL benchmarks but does not explicitly provide percentages, sample counts, or clear descriptions of how the datasets were split into training, validation, and test subsets for model training and evaluation within the text. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) are provided in the paper's text. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers (e.g., Python version, PyTorch version, CUDA version). |
| Experiment Setup | Yes | In our experiments, we set the number of sampled actions N = 10 by default and tune the weighting coefficient λ. We report the λ used for all tasks in Appendix C, along with details on the experiments and implementation. We conduct a detailed parameter study on MCQ. MCQ generally contains two hyperparameters, weighting coefficient λ and number of sampled actions N. |