PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training
Authors: Kimin Lee, Laura M Smith, Pieter Abbeel
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design our experiments to investigate the following: 1. How does PEBBLE compare to existing methods in terms of sample and feedback efficiency? 2. What is the contribution of each of the proposed techniques in PEBBLE? 3. Can PEBBLE learn novel behaviors for which a typi- cal reward function is difficult to engineer? 4. Can PEBBLE mitigate the effects of reward exploita- tion? |
| Researcher Affiliation | Academia | 1University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 EXPLORE: Unsupervised exploration; Algorithm 2 PEBBLE |
| Open Source Code | Yes | Source code and videos are available at https://sites.google. com/view/icml21pebble. |
| Open Datasets | Yes | We evaluate PEBBLE on several continuous control tasks involving locomotion and robotic manipulation from Deep Mind Control Suite (DMControl; Tassa et al. 2018; 2020) and Meta-world (Yu et al., 2020). |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits. It mentions training agents and using replay buffers, but without numerical details on data partitioning for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like SAC and PPO, and an optimizer like Adam, but does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | No | The paper mentions some aspects of the experimental setup, such as pre-training an agent for 10K timesteps and using an ensemble of three reward models. However, it does not explicitly provide specific numerical hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or a clearly labeled table/paragraph detailing training settings in the main text, deferring more details to supplementary material. |