Amortizing intractable inference in large language models
Authors: Edward J Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that this distributionmatching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. |
| Researcher Affiliation | Academia | Edward J. Hu*, Moksh Jain*, Eric Elmoznino Mila Quebec AI Institute, Universit e de Montr eal {edward.hu,moksh.jain,eric.elmoznino,... Younesse Kaddar University of Oxford younesse.kaddar@chch.ox.ac.uk Guillaume Lajoie , Yoshua Bengio , Nikolay Malkin Mila Quebec AI Institute, Universit e de Montr eal ...,guillaume.lajoie,yoshua.bengio,nikolay.malkin}@mila.quebec |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Code for our experiments is available at https://github.com/GFNOrg/gfn-lm-tuning. |
| Open Datasets | Yes | We consider a dataset of prompts from Open Web Text (Gokaslan et al., 2019) with a 1.5B param GPT-2 XL (Radford et al., 2019) as the base model. We use the ROCStories corpus (Mostafazadeh et al., 2016) and SUBJ (Pang & Lee, 2004). |
| Dataset Splits | Yes | We obtained a dataset of 1000 prompts from Open Web Text (Gokaslan et al., 2019) that were each 1-3 sentences long, 50 of which were used for validation. |
| Hardware Specification | No | The research was enabled in part by computational resources provided by the Digital Research Alliance of Canada (https://alliancecan.ca), Mila (https://mila.quebec), and NVIDIA. |
| Software Dependencies | No | This was done with full fine-tuning using the trl library (von Werra et al., 2020). We use Lo RA (Hu et al., 2022) instead of full fine-tuning for hardware efficiency in all experiments. |
| Experiment Setup | Yes | We detail the hyperparameters used for training GFlow Nets in our experiments in Table C.2. We run GFlow Net fine-tuning for 1000 steps with a linear warmup over 200 steps, a fixed learning rate of 0.0005, and a batch size of 512 samples; see Table D.3 for all the hyperparameters used. We detail the hyperparameters used for training GFlow Nets in our experiments in Table E.2. |