reproducibilityindex.ai

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Authors: Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the D4RL benchmarks demonstrate that MCQ achieves remarkable performance compared with prior work.
Researcher Affiliation	Academia	Jiafei Lyu1 , Xiaoteng Ma2 , Xiu Li1 , Zongqing Lu3 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Department of Automation, Tsinghua Unversity 3School of Computer Science, Peking University
Pseudocode	Yes	Algorithm 1 Mildly Conservative Q-learning (MCQ)
Open Source Code	Yes	Our code is publicly available at https://github.com/dmksjfl/MCQ.
Open Datasets	Yes	Experimental results on the D4RL Mu Jo Co locomotion tasks demonstrate that MCQ surpasses recent strong baseline methods on most of the tasks, especially on non-expert datasets. [...] We conduct experiments on Mu Jo Co locomotion tasks, which are made up of five types of datasets (random, medium, medium-replay, medium-expert, and expert), yielding a total of 15 datasets. We use the most recently released '-v2' datasets for performance evaluation.
Dataset Splits	No	The paper states it uses D4RL benchmarks but does not explicitly provide percentages, sample counts, or clear descriptions of how the datasets were split into training, validation, and test subsets for model training and evaluation within the text.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) are provided in the paper's text.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup	Yes	In our experiments, we set the number of sampled actions N = 10 by default and tune the weighting coefficient λ. We report the λ used for all tasks in Appendix C, along with details on the experiments and implementation. We conduct a detailed parameter study on MCQ. MCQ generally contains two hyperparameters, weighting coefficient λ and number of sampled actions N.