PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Authors: Kimin Lee, Laura M Smith, Pieter Abbeel

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design our experiments to investigate the following: 1. How does PEBBLE compare to existing methods in terms of sample and feedback efficiency? 2. What is the contribution of each of the proposed techniques in PEBBLE? 3. Can PEBBLE learn novel behaviors for which a typi- cal reward function is difficult to engineer? 4. Can PEBBLE mitigate the effects of reward exploita- tion?
Researcher Affiliation Academia 1University of California, Berkeley.
Pseudocode Yes Algorithm 1 EXPLORE: Unsupervised exploration; Algorithm 2 PEBBLE
Open Source Code Yes Source code and videos are available at https://sites.google. com/view/icml21pebble.
Open Datasets Yes We evaluate PEBBLE on several continuous control tasks involving locomotion and robotic manipulation from Deep Mind Control Suite (DMControl; Tassa et al. 2018; 2020) and Meta-world (Yu et al., 2020).
Dataset Splits No The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It mentions training agents and using replay buffers, but without numerical details on data partitioning for validation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms like SAC and PPO, and an optimizer like Adam, but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup No The paper mentions some aspects of the experimental setup, such as pre-training an agent for 10K timesteps and using an ensemble of three reward models. However, it does not explicitly provide specific numerical hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or a clearly labeled table/paragraph detailing training settings in the main text, deferring more details to supplementary material.