CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Authors: Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging Car Racing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for Car Racing, while requiring 500X fewer environment interactions.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Google Research.
Pseudocode Yes Algorithm 1 CLUTR
Open Source Code Yes We open source our code at https://github.com/clutr/clutr.
Open Datasets No To train our VAEs, we generate random tasks by uniformly sampling from ΘT , the set of possible tasks. Thus, we do not require any interaction with the environment to learn the task manifold. ... For Car Racing, ... We train the VAE on 1 million randomly generated tracks... For navigation tasks ... we generated 10 million random grids... The paper describes how the training data was generated but does not provide a link or specific citation for publicly accessing these generated datasets.
Dataset Splits No The paper mentions testing on specific benchmarks but does not specify how the data was split into training, validation, and test sets with exact percentages, sample counts, or a detailed splitting methodology.
Hardware Specification Yes We used a single NVIDIA T4 GPUs for our experiments with machines having 8(16) and 16(32) physical(virtual) cores, 64GB and 128 GB Memory for Car Racing and Minigrid experiments.
Software Dependencies No The paper mentions using PPO (Schulman et al., 2017) and Adam for training, but does not provide specific version numbers for these libraries or other software dependencies (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility.
Experiment Setup Yes All our agents are trained with PPO (Schulman et al., 2017). We did not perform any hyperparameter search for our experiments. The Car Racing experiments used the same parameters used in Jiang et al. (2021a) and the Minigrid experiments used the parameters from Dennis et al. (2020). The VAE used for Car Racing and Minigrid standard objective experiments (Section E.2) were trained using the default parameters from Bowman et al. (2015). The detailed parameters are listed in Table 2 and Table 3.