Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery
Authors: Yiqin Yang, Hao Hu, Wenzhe Li, Siyuan Li, Jun Yang, Qianchuan Zhao, Chongjie Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results and extensive ablation studies on the standard D4RL benchmark show that our method has a good representation ability for policies and achieves superior performance in most tasks. |
| Researcher Affiliation | Academia | 1Department of Automation, Tsinghua University 2Institute for Interdisciplinary Information Sciences, Tsinghua University 3Harbin Institute of Technology |
| Pseudocode | Yes | Algorithm 1: IQL+LPD algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. While it states 'We reproduce OPAL with authors providing code via email', this refers to a baseline, not their own method's code release. |
| Open Datasets | Yes | We evaluate our method on a suite of standard and challenging offline tasks (e.g., D4RL (Fu et al. 2020)) including Franka kitchen, Antmaze, and Adroit. |
| Dataset Splits | No | The paper mentions using D4RL tasks but does not specify how the data within these tasks are split into train/validation/test sets in the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | Each experiment result is averaged over five random seeds with a standard deviation. We test the performance of IQL+OPAL with the most suitable steps c {1, 10} and fine-tune the expectile ratio λ and temperature parameter β in IQL. We ran IQL+LPD on kitchen-partial-v0 with various parameters, such as the expectile ratio λ [0.45, 0.9] and the temperature β [0.35, 0.8]. |