reproducibilityindex.ai

Boosting Offline Reinforcement Learning with Action Preference Query

Authors: Qisen Yang, Shenzhi Wang, Matthieu Gaetan Lin, Shiji Song, Gao Huang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Moreover, comprehensive experiments on the D4RL benchmark and state-of-the-art algorithms demonstrate that OAP yields higher (29% on average) scores, especially on challenging Ant Maze tasks (98% higher). Empirically, we instantiate OAP with state-of-the-art offline RL algorithms and perform proof-of-concept investigations on the D4RL benchmark (Fu et al., 2020).
Researcher Affiliation	Academia	1Department of Automation, BNRist, Tsinghua University, Beijing, China 2Department of Computer Science, BNRist, Tsinghua University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Offline-with-Action-Preferences
Open Source Code	No	The paper provides a link to the `rlkit` repository (https://github.com/rail-berkeley/rlkit), which is a library used for pre-training, but there is no explicit statement that the authors' specific implementation of OAP or the code for their described methodology is open-sourced or available.
Open Datasets	Yes	We consider three different domains of tasks in D4RL (Fu et al., 2020) benchmark: Gym, Ant Maze, and Adroit.
Dataset Splits	No	The paper does not explicitly specify how training, validation, and test splits were defined or used within their experimental setup, such as specific percentages, number of samples, or reference to standard D4RL splits for these purposes.
Hardware Specification	No	The Acknowledgments section mentions a "generous donation of computing resources by High-Flyer AI" but does not provide any specific details about the hardware used, such as CPU models, GPU models, or other relevant specifications.
Software Dependencies	No	Table 5 lists hyperparameters for optimizers (Adam) and activation functions (ReLU) with citations to their original papers, but it does not specify version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other software components crucial for replication.
Experiment Setup	Yes	The hyperparameters of OAP instantiated on TD3+BC (Fujimoto & Gu, 2021) and IQL (Kostrikov et al., 2022) are presented in Table 5. Table 5 includes detailed settings such as Critic learning rate 3e-4, Mini-batch size 256, and Discount factor 0.99.