Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

Authors: Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in the classic videogame Net Hack and the text environment Science World to demonstrate SSO s ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom Net Hack task and outperforms the previous state-of-the-art in Science World by 35%.
Researcher Affiliation Collaboration 1Department of Computer Science, University of California Irvine, Irvine CA, United States 2Allen Institute for AI, Seattle Washington, United States. * Equal contribution. Correspondence to: Kolby Nottingham <knotting@uci.edu>.
Pseudocode Yes Algorithm 1 Extract; Appendix C. SSO Code. We include the following python code as a high-level overview of SSO.
Open Source Code Yes We include the following python code as a high-level overview of SSO. However, the complete codebase is available at https://github.com/allenai/sso.
Open Datasets Yes We evaluate SSO s ability to quickly adapt to and transfer knowledge between tasks in the Science World domain (Wang et al., 2022)....We utilize the Mini Hack library (Samvelyan et al., 2021) to design a custom level that tests an LLM actor s ability to explore and learn several skills to complete a task.
Dataset Splits No The paper describes training and testing procedures but does not explicitly mention distinct validation splits or how they are used for hyperparameter tuning.
Hardware Specification No The paper specifies the use of OpenAI's GPT-4 and GPT-4-Turbo models, and text-embedding-ada-002 as the embedding model, but does not provide details about the underlying hardware (e.g., specific GPUs, CPUs, or memory) used to run these models or experiments.
Software Dependencies No The paper mentions specific versions of LLMs (e.g., gpt-4-0613, gpt-4-1106-preview) and an embedding model (text-embedding-ada-002), but it does not list any other software dependencies such as programming languages, libraries, or frameworks with specific version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes Table 3: Skill Set Optimization hyperparameters. This table lists specific values for parameters such as Max skill length (5), Min skill length (2), Adaptation training episodes (5), Transfer training episodes (30), Sampling temp (train) (0.7), Sampling temp (test) (0.0), Max retrieved skills (3), Skill refinement threshold (0), Skill length score weight (0.01), Reward score weight (0.1), State similarity score weight (1.0), Action similarity score weight (1.0).