Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills
Authors: Victor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-I-Nieto, Jordi Torres
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned. |
| Researcher Affiliation | Collaboration | 1Barcelona Supercomputing Center 2Salesforce Research 3Universitat Polit ecnica de Catalunya. Correspondence to: V ıctor Campos <victor.campos@bsc.es>. |
| Pseudocode | No | The paper describes algorithms and methods conceptually and mathematically but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at https: //github.com/victorcampos7/edl. |
| Open Datasets | No | The paper uses 'controlled synthetic environments' which are custom-built 2D mazes. It states: 'In the controlled environments considered in this work, this can be achieved by sampling states from an oracle.' No specific access information (link, DOI, citation to a public dataset) for these environments or data generated from them is provided. |
| Dataset Splits | No | The paper mentions 'controlled synthetic environments' and discusses experimental results, but it does not specify exact train/validation/test splits by percentage or sample count. It refers to 'a detailed description of the experimental setup and the hyperparameters' in the Supplementary Material, but this information is not in the main text. |
| Hardware Specification | No | The paper mentions 'hardware accelerators (NVIDIA, 2017; Jouppi et al., 2017)' in the introduction in a general context of RL advancements, but does not provide any specific hardware details (e.g., GPU models, CPU types, memory) used for their own experiments. |
| Software Dependencies | No | The paper mentions using specific models like VQ-VAE and Sibling Rivalry, but it does not provide specific version numbers for any software, libraries, or dependencies used in their experiments (e.g., 'PyTorch 1.9' or 'Python 3.8'). |
| Experiment Setup | No | The paper states: 'We refer the reader to the SM for a detailed description of the experimental setup and the hyperparameters.' This indicates that specific experimental setup details, including concrete hyperparameter values or training configurations, are deferred to the supplementary material and are not present in the main text. |