reproducibilityindex.ai

LISA: Learning Interpretable Skill Abstractions from Language

Authors: Divyansh Garg, Skanda Vaidyanath, Kuno Kim, Jiaming Song, Stefano Ermon

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate LISA on grid-world navigation and robotic manipulation tasks. We compare the performance of LISA with a strong non-hierarchical baseline in the low-data regime. We then analyse our learnt skill abstractions in detail what they represent, how we can interpret them and how they improve performance on downstream composition tasks.
Researcher Affiliation	Academia	Divyansh Garg Skanda Vaidyanath Kuno Kim Jiaming Song Stefano Ermon Stanford University {divgarg, svaidyan, khkim, tsong, ermon} @stanford.edu
Pseudocode	Yes	Algorithm 1 Training LISA Input: Dataset D of language-paired trajectories Input: Num skills K and horizon H 1: Initialize skill predictor fϕ, policy πθ 2: Vector Quantization op q( ) 3: while not converged do 4: Sample τ = (l, {s0, s1, s2...s T }, {a0, a1, a2...a T }) 5: Initialize S = {s0} List of seen states 6: for k = 0.. T H do Sample a skill every H steps 7: z q(fϕ(l, S)) 8: for step t = 1..H do Predict actions using a fixed skill and context length H 9: ak H+t πθ(z, S[: H]) 10: S S {sk H+t} Append seen state 11: end for 12: Train fϕ, πθ using objective LLISA 13: end for 14: end while
Open Source Code	No	No, the paper does not contain an explicit statement about releasing the code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	Baby AI Dataset. The Baby AI dataset [13] contains 19 levels of increasing difficulty where each level is set in a grid world and an agent sees a partially observed ego-centric view in a square of size 7x7. LORe L Sawyer Dataset. This dataset [38] consists of pseudo-expert trajectories or play data collected from a replay buffer of a random RL policy and has been labeled with post-hoc crowdsourced language instructions.
Dataset Splits	No	No, the paper mentions using 1k, 10k, and 100k trajectories for training and evaluating on a set of 100 different instructions, but it does not specify a separate validation split or its details (percentages, counts, or explicit purpose).
Hardware Specification	No	No, the paper generally mentions 'compute requirement' and compares parameter counts of models but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments.
Software Dependencies	No	No, the paper mentions using a 'pre-trained Distil BERT [44] encoder' but does not specify version numbers for any other software dependencies like programming languages, libraries, or frameworks (e.g., PyTorch version).
Experiment Setup	No	No, while the paper discusses architectural choices (e.g., number of transformer layers) and parameters like the skill horizon H and number of skills K, it does not provide concrete numerical values for standard hyperparameters such as learning rate, batch size, or optimizer settings in the main text. It mentions some details are in appendices but not directly in the main body.