Scaling Goal-based Exploration via Pruning Proto-goals

Authors: Akhil Bagaria, Tom Schaul

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.
Researcher Affiliation Collaboration 1Brown University, Providence, RI, USA 2Deep Mind, London, UK
Pseudocode Yes Algorithm 2 in the appendix for details. Implementation details about creating and managing combination proto-goals can be found in the appendix (Algorithm 5). More details about the agent, as well as pseudo-code, can be found in Section C of the appendix.
Open Source Code No The paper does not include an explicit statement about releasing code for the described method or a link to a code repository.
Open Datasets No While the paper uses established environments like TAXI, MINIHACK, and BABA IS YOU, it introduces modifications (e.g., SPARSETAXI) or custom levels (e.g., U-MAZE). However, it does not provide concrete access information (links, DOIs, specific citations for the modified versions) for a publicly available dataset or environment setup that fully reproduces their specific data generation or usage.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with specific percentages or sample counts, as is typical for static datasets. The experiments are conducted in dynamic reinforcement learning environments.
Hardware Specification No The paper mentions using a 'distributed RL agent, namely R2D2' and '128 actors asynchronously interacting with 128 environments,' implying a substantial computational setup. However, it does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions integrating with 'R2D2 [Kapturowski et al., 2018]' and using a 'Q-learning-based' approach, but it does not list specific software libraries or frameworks with their version numbers (e.g., Python version, TensorFlow/PyTorch version).
Experiment Setup Yes In practice, when queried, the PGE does not output the full distribution, but a (small) discrete set of K plausible and desirable goals, by sampling from PG with replacement (K = 100 in all our experiments). A goal is considered mastered when its success rate is above a pre-specified threshold κ (= 0.6 in all our experiments). When that probability is 0, the agent never reaches the goal during evaluation; acting according to the task reward function 10% of the time during training performed the best in this setting.