Pre-Training Goal-based Models for Sample-Efficient Reinforcement Learning

Authors: Haoqi Yuan, Zhancun Mu, Feiyang Xie, Zongqing Lu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in a robotic simulation environment and the challenging open-world environment of Minecraft demonstrate PTGM s superiority in sample efficiency and task performance compared to baselines.
Researcher Affiliation Academia Haoqi Yuan1, Zhancun Mu2, Feiyang Xie2, Zongqing Lu1,3 1School of Computer Science, Peking University 2Yuanpei College, Peking University 3Beijing Academy of Artificial Intelligence
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code No Project page: https://sites.google.com/view/ptgm-iclr/. The paper provides a project page link, which is a high-level overview page, not a direct link to a source code repository, nor does it explicitly state that the code for the methodology is released.
Open Datasets Yes The dataset is provided in the D4RL benchmark (Fu et al., 2020), consisting of 150K transition samples.
Dataset Splits Yes We partition 10K frames from the dataset as the validation set.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using SAC and PPO algorithms and refers to SPi RL implementation, but it does not provide specific software version numbers for libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes Table 2 lists the hyperparameters of SAC. Table 4 lists the hyperparameters of PPO. Both tables provide specific values for hyperparameters like discount factor, learning rate, batch size, etc.