Diversification of Adaptive Policy for Effective Offline Reinforcement Learning
Authors: Yunseon Choi, Li Zhao, Chuheng Zhang, Lei Song, Jiang Bian, Kee-Eung Kim
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Mo DAP through experiments on the D4RL and Neo RL benchmarks, showcasing its performance superiority over state-of-the-art algorithms. |
| Researcher Affiliation | Collaboration | 1KAIST AI 2Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 Mo DAP |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of its source code. |
| Open Datasets | Yes | We evaluate Mo DAP through experiments on the D4RL [Fu et al., 2020] and Neo RL [Qin et al., 2022] benchmarks |
| Dataset Splits | Yes | In the initial phase of pre-training the dynamics models, we divide the offline dataset into a training set and a validation set using an 8:2 ratio. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using SAC and GRU but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In the initial phase of pre-training the dynamics models, we divide the offline dataset into a training set and a validation set using an 8:2 ratio. For each task, we construct a set of estimated models by training either 7 (for D4RL) or 15 (for Neo RL) models. After this training, we proceed to select the top 5 (for D4RL) or 10 (for Neo RL) models based on their predictive accuracy, which is evaluated on the validation set. |