Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
Authors: Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method in the classic videogame Net Hack and the text environment Science World to demonstrate SSO s ability to optimize a set of skills and perform in-context policy improvement. SSO outperforms baselines by 40% in our custom Net Hack task and outperforms the previous state-of-the-art in Science World by 35%. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of California Irvine, Irvine CA, United States 2Allen Institute for AI, Seattle Washington, United States. * Equal contribution. Correspondence to: Kolby Nottingham <knotting@uci.edu>. |
| Pseudocode | Yes | Algorithm 1 Extract; Appendix C. SSO Code. We include the following python code as a high-level overview of SSO. |
| Open Source Code | Yes | We include the following python code as a high-level overview of SSO. However, the complete codebase is available at https://github.com/allenai/sso. |
| Open Datasets | Yes | We evaluate SSO s ability to quickly adapt to and transfer knowledge between tasks in the Science World domain (Wang et al., 2022)....We utilize the Mini Hack library (Samvelyan et al., 2021) to design a custom level that tests an LLM actor s ability to explore and learn several skills to complete a task. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly mention distinct validation splits or how they are used for hyperparameter tuning. |
| Hardware Specification | No | The paper specifies the use of OpenAI's GPT-4 and GPT-4-Turbo models, and text-embedding-ada-002 as the embedding model, but does not provide details about the underlying hardware (e.g., specific GPUs, CPUs, or memory) used to run these models or experiments. |
| Software Dependencies | No | The paper mentions specific versions of LLMs (e.g., gpt-4-0613, gpt-4-1106-preview) and an embedding model (text-embedding-ada-002), but it does not list any other software dependencies such as programming languages, libraries, or frameworks with specific version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Table 3: Skill Set Optimization hyperparameters. This table lists specific values for parameters such as Max skill length (5), Min skill length (2), Adaptation training episodes (5), Transfer training episodes (30), Sampling temp (train) (0.7), Sampling temp (test) (0.0), Max retrieved skills (3), Skill refinement threshold (0), Skill length score weight (0.01), Reward score weight (0.1), State similarity score weight (1.0), Action similarity score weight (1.0). |