Language-guided Skill Learning with Temporal Variational Inference
Authors: Haotian Fu, Pratyusha Sharma, Elias Stengel-Eskin, George Konidaris, Nicolas Le Roux, Marc-Alexandre Côté, Xingdi Yuan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in Baby AI, a grid world navigation environment, as well as ALFRED, a household simulation environment. |
| Researcher Affiliation | Collaboration | 1Brown University 2MIT 3University of North Carolina, Chapel Hill 4Mila 5Microsoft Research. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and videos can be found at https://language-skill-discovery.github.io/. |
| Open Datasets | Yes | Baby AI (Chevalier-Boisvert et al., 2019) is an environment where an agent navigates and interacts in a grid world to achieve a goal described in language... The Baby AI dataset contains expert demonstrations collected from 40 different task types of varying levels of difficulty. ALFRED (Shridhar et al., 2020a) is a complex environment based on the AI2-THOR (Kolve et al., 2017) simulator... For the ALFRED dataset, we follow the settings in Pashevich et al. (2021) where the training dataset consists of more than 20000 trajectories. |
| Dataset Splits | Yes | For ALFRED, we follow the settings in (Pashevich et al., 2021). We train our algorithm on the training dataset with cross validation and test on the valid dataset. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like GPT-4, T5 encoder, and Faster R-CNN encoder but does not provide specific version numbers for these or other software dependencies required for replication. |
| Experiment Setup | Yes | Table 2: Hyperparameters of LAST. Hyperparameters Value: learning rate 3e-4, batch size 16, Size of skill library 100, weight of KL loss 0.0001, λ 1, 0.1, 0.01, γ 0.99, temperature (SAC) 1, α1 0.01, α2 1, training epochs 80,140. Also mentions: 'The transformers have 3 layers and 8 attention heads.' and 'we take the weighted sum of them as the total loss (weight 1 for the action type and weight 0.1 for the object type).' |