The Wisdom of Hindsight Makes Language Models Better Instruction Followers
Authors: Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of HIR extensively on 12 challenging Big Bench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised finetuning. |
| Researcher Affiliation | Academia | Tianjun Zhang * 1 Fangchen Liu * 1 Justin Wong 1 Pieter Abbeel 1 Joseph E. Gonzalez 1 1University of California, Berkeley. Correspondence to: Tianjun Zhang <tianjunz@berkeley.edu>, Fangchen Liu <fangchen liu@berkeley.edu>. |
| Pseudocode | Yes | Algorithm 1 Two-Stage Hindsight Instruction Relabeling (HIR) |
| Open Source Code | Yes | The implementation of HIR is available at https://github.com/tianjunz/HIR. |
| Open Datasets | Yes | We evaluate our algorithm extensively on 12 Big Bench (Srivastava et al., 2022) language model reasoning tasks. |
| Dataset Splits | No | To be specific, we divide the task data into 80% for training and 20% for testing. |
| Hardware Specification | No | We use the FLAN-T5 models (Chung et al., 2022) as the base model... |
| Software Dependencies | No | PPO For this baseline, we adopt the implementation of trlx from Carper AI. We directly use the Git Hub repository and load the FLAN-T5-large as the base model. |
| Experiment Setup | Yes | A. Training and Implementation Details A.1. Hyperparameters We provide all the hyperparameters we used in our experiments. This includes all the experiment settings we used for the baselines and our method. |