CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Authors: Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging Car Racing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for Car Racing, while requiring 500X fewer environment interactions. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Google Research. |
| Pseudocode | Yes | Algorithm 1 CLUTR |
| Open Source Code | Yes | We open source our code at https://github.com/clutr/clutr. |
| Open Datasets | No | To train our VAEs, we generate random tasks by uniformly sampling from ΘT , the set of possible tasks. Thus, we do not require any interaction with the environment to learn the task manifold. ... For Car Racing, ... We train the VAE on 1 million randomly generated tracks... For navigation tasks ... we generated 10 million random grids... The paper describes how the training data was generated but does not provide a link or specific citation for publicly accessing these generated datasets. |
| Dataset Splits | No | The paper mentions testing on specific benchmarks but does not specify how the data was split into training, validation, and test sets with exact percentages, sample counts, or a detailed splitting methodology. |
| Hardware Specification | Yes | We used a single NVIDIA T4 GPUs for our experiments with machines having 8(16) and 16(32) physical(virtual) cores, 64GB and 128 GB Memory for Car Racing and Minigrid experiments. |
| Software Dependencies | No | The paper mentions using PPO (Schulman et al., 2017) and Adam for training, but does not provide specific version numbers for these libraries or other software dependencies (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | All our agents are trained with PPO (Schulman et al., 2017). We did not perform any hyperparameter search for our experiments. The Car Racing experiments used the same parameters used in Jiang et al. (2021a) and the Minigrid experiments used the parameters from Dennis et al. (2020). The VAE used for Car Racing and Minigrid standard objective experiments (Section E.2) were trained using the default parameters from Bowman et al. (2015). The detailed parameters are listed in Table 2 and Table 3. |