Residual Q-Learning: Offline and Online Policy Customization without Value
Authors: Chenran Li, Chen Tang, Haruki Nishimura, Jean Mercat, Masayoshi TOMIZUKA, Wei Zhan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed algorithms on four environments selected from different domains: Cart Pole and Continuous Mountain Car environments from the Open AI gym classic control suite [5], and Highway and Parking from the highway-env environments [32]. In our experiments, we implemented our algorithms upon Stable-Baselines3 [43] and its imitation library [16]. In Sec. 5.1, we provide the configurations of our experiments, including the settings of policy customization tasks in different environments, baselines, and evaluation metrics. In Sec. 5.2, we present and analyze the experimental results of RL offline policy customization. |
| Researcher Affiliation | Collaboration | Chenran Li1 , Chen Tang1 , Haruki Nishimura2, Jean Mercat2, Masayoshi Tomizuka1, Wei Zhan1 1 University of California Berkeley, 2 Toyota Research Institute, USA |
| Pseudocode | No | The paper describes algorithms using mathematical equations and textual descriptions of update rules and loss functions, but it does not contain a formally structured pseudocode block or an algorithm labeled as such. |
| Open Source Code | Yes | Demo videos and code are available on our website: https://sites.google.com/view/residualq-learning. |
| Open Datasets | Yes | Cart Pole and Continuous Mountain Car environments from the Open AI gym classic control suite [5], and Highway and Parking from the highway-env environments [32]. |
| Dataset Splits | No | The paper mentions 'The detailed configurations of the environments can be found in Appendix D.', but the main text does not explicitly provide specific train/validation/test dataset split percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only mentions implementing algorithms upon Stable-Baselines3. |
| Software Dependencies | No | The paper mentions 'Stable-Baselines3 [43] and its imitation library [16]' but does not provide specific version numbers for these software dependencies, which are required for reproducible description. |
| Experiment Setup | No | While Section 5.1 is titled 'Experiment Setup' and describes task settings and metrics, it defers detailed configurations to Appendix D and does not provide specific hyperparameter values, training configurations, or system-level settings in the main text. |