Scaling Goal-based Exploration via Pruning Proto-goals
Authors: Akhil Bagaria, Tom Schaul
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments. |
| Researcher Affiliation | Collaboration | 1Brown University, Providence, RI, USA 2Deep Mind, London, UK |
| Pseudocode | Yes | Algorithm 2 in the appendix for details. Implementation details about creating and managing combination proto-goals can be found in the appendix (Algorithm 5). More details about the agent, as well as pseudo-code, can be found in Section C of the appendix. |
| Open Source Code | No | The paper does not include an explicit statement about releasing code for the described method or a link to a code repository. |
| Open Datasets | No | While the paper uses established environments like TAXI, MINIHACK, and BABA IS YOU, it introduces modifications (e.g., SPARSETAXI) or custom levels (e.g., U-MAZE). However, it does not provide concrete access information (links, DOIs, specific citations for the modified versions) for a publicly available dataset or environment setup that fully reproduces their specific data generation or usage. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits with specific percentages or sample counts, as is typical for static datasets. The experiments are conducted in dynamic reinforcement learning environments. |
| Hardware Specification | No | The paper mentions using a 'distributed RL agent, namely R2D2' and '128 actors asynchronously interacting with 128 environments,' implying a substantial computational setup. However, it does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions integrating with 'R2D2 [Kapturowski et al., 2018]' and using a 'Q-learning-based' approach, but it does not list specific software libraries or frameworks with their version numbers (e.g., Python version, TensorFlow/PyTorch version). |
| Experiment Setup | Yes | In practice, when queried, the PGE does not output the full distribution, but a (small) discrete set of K plausible and desirable goals, by sampling from PG with replacement (K = 100 in all our experiments). A goal is considered mastered when its success rate is above a pre-specified threshold κ (= 0.6 in all our experiments). When that probability is 0, the agent never reaches the goal during evaluation; acting according to the task reward function 10% of the time during training performed the best in this setting. |