Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning
Authors: Haoqi Yuan, Zhancun Mu, Feiyang Xie, Zongqing Lu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results in a robotic simulation environment and the challenging open-world environment of Minecraft demonstrate PTGM s superiority in sample efficiency and task performance compared to baselines. |
| Researcher Affiliation | Academia | Haoqi Yuan1, Zhancun Mu2, Feiyang Xie2, Zongqing Lu1,3 1School of Computer Science, Peking University 2Yuanpei College, Peking University 3Beijing Academy of Artificial Intelligence |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | Project page: https://sites.google.com/view/ptgm-iclr/. The paper provides a project page link, which is a high-level overview page, not a direct link to a source code repository, nor does it explicitly state that the code for the methodology is released. |
| Open Datasets | Yes | The dataset is provided in the D4RL benchmark (Fu et al., 2020), consisting of 150K transition samples. |
| Dataset Splits | Yes | We partition 10K frames from the dataset as the validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using SAC and PPO algorithms and refers to SPi RL implementation, but it does not provide specific software version numbers for libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | Table 2 lists the hyperparameters of SAC. Table 4 lists the hyperparameters of PPO. Both tables provide specific values for hyperparameters like discount factor, learning rate, batch size, etc. |