reproducibilityindex.ai

Residual Q-Learning: Offline and Online Policy Customization without Value

Authors: Chenran Li, Chen Tang, Haruki Nishimura, Jean Mercat, Masayoshi TOMIZUKA, Wei Zhan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed algorithms on four environments selected from different domains: Cart Pole and Continuous Mountain Car environments from the Open AI gym classic control suite [5], and Highway and Parking from the highway-env environments [32]. In our experiments, we implemented our algorithms upon Stable-Baselines3 [43] and its imitation library [16]. In Sec. 5.1, we provide the configurations of our experiments, including the settings of policy customization tasks in different environments, baselines, and evaluation metrics. In Sec. 5.2, we present and analyze the experimental results of RL offline policy customization.
Researcher Affiliation	Collaboration	Chenran Li1 , Chen Tang1 , Haruki Nishimura2, Jean Mercat2, Masayoshi Tomizuka1, Wei Zhan1 1 University of California Berkeley, 2 Toyota Research Institute, USA
Pseudocode	No	The paper describes algorithms using mathematical equations and textual descriptions of update rules and loss functions, but it does not contain a formally structured pseudocode block or an algorithm labeled as such.
Open Source Code	Yes	Demo videos and code are available on our website: https://sites.google.com/view/residualq-learning.
Open Datasets	Yes	Cart Pole and Continuous Mountain Car environments from the Open AI gym classic control suite [5], and Highway and Parking from the highway-env environments [32].
Dataset Splits	No	The paper mentions 'The detailed configurations of the environments can be found in Appendix D.', but the main text does not explicitly provide specific train/validation/test dataset split percentages or counts for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only mentions implementing algorithms upon Stable-Baselines3.
Software Dependencies	No	The paper mentions 'Stable-Baselines3 [43] and its imitation library [16]' but does not provide specific version numbers for these software dependencies, which are required for reproducible description.
Experiment Setup	No	While Section 5.1 is titled 'Experiment Setup' and describes task settings and metrics, it defers detailed configurations to Appendix D and does not provide specific hyperparameter values, training configurations, or system-level settings in the main text.