reproducibilityindex.ai

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Authors: Kimin Lee, Laura M Smith, Pieter Abbeel

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design our experiments to investigate the following: 1. How does PEBBLE compare to existing methods in terms of sample and feedback efﬁciency? 2. What is the contribution of each of the proposed techniques in PEBBLE? 3. Can PEBBLE learn novel behaviors for which a typi- cal reward function is difﬁcult to engineer? 4. Can PEBBLE mitigate the effects of reward exploita- tion?
Researcher Affiliation	Academia	1University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 EXPLORE: Unsupervised exploration; Algorithm 2 PEBBLE
Open Source Code	Yes	Source code and videos are available at https://sites.google. com/view/icml21pebble.
Open Datasets	Yes	We evaluate PEBBLE on several continuous control tasks involving locomotion and robotic manipulation from Deep Mind Control Suite (DMControl; Tassa et al. 2018; 2020) and Meta-world (Yu et al., 2020).
Dataset Splits	No	The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It mentions training agents and using replay buffers, but without numerical details on data partitioning for validation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like SAC and PPO, and an optimizer like Adam, but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	No	The paper mentions some aspects of the experimental setup, such as pre-training an agent for 10K timesteps and using an ensemble of three reward models. However, it does not explicitly provide specific numerical hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or a clearly labeled table/paragraph detailing training settings in the main text, deferring more details to supplementary material.