reproducibilityindex.ai

Amortizing intractable inference in large language models

Authors: Edward J Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that this distributionmatching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization.
Researcher Affiliation	Academia	Edward J. Hu, Moksh Jain, Eric Elmoznino Mila Quebec AI Institute, Universit e de Montr eal {edward.hu,moksh.jain,eric.elmoznino,... Younesse Kaddar University of Oxford younesse.kaddar@chch.ox.ac.uk Guillaume Lajoie , Yoshua Bengio , Nikolay Malkin Mila Quebec AI Institute, Universit e de Montr eal ...,guillaume.lajoie,yoshua.bengio,nikolay.malkin}@mila.quebec
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code for our experiments is available at https://github.com/GFNOrg/gfn-lm-tuning.
Open Datasets	Yes	We consider a dataset of prompts from Open Web Text (Gokaslan et al., 2019) with a 1.5B param GPT-2 XL (Radford et al., 2019) as the base model. We use the ROCStories corpus (Mostafazadeh et al., 2016) and SUBJ (Pang & Lee, 2004).
Dataset Splits	Yes	We obtained a dataset of 1000 prompts from Open Web Text (Gokaslan et al., 2019) that were each 1-3 sentences long, 50 of which were used for validation.
Hardware Specification	No	The research was enabled in part by computational resources provided by the Digital Research Alliance of Canada (https://alliancecan.ca), Mila (https://mila.quebec), and NVIDIA.
Software Dependencies	No	This was done with full fine-tuning using the trl library (von Werra et al., 2020). We use Lo RA (Hu et al., 2022) instead of full fine-tuning for hardware efficiency in all experiments.
Experiment Setup	Yes	We detail the hyperparameters used for training GFlow Nets in our experiments in Table C.2. We run GFlow Net fine-tuning for 1000 steps with a linear warmup over 200 steps, a fixed learning rate of 0.0005, and a batch size of 512 samples; see Table D.3 for all the hyperparameters used. We detail the hyperparameters used for training GFlow Nets in our experiments in Table E.2.