reproducibilityindex.ai

Toolformer: Language Models Can Teach Themselves to Use Tools

Authors: Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a variety of different downstream tasks, demonstrating that after learning to use tools, Toolformer, which is based on a pretrained GPT-J model (Wang and Komatsuzaki, 2021) with 6.7B parameters, achieves much stronger zero-shot results, clearly outperforming a much larger GPT-3 model (Brown et al., 2020) and several other baselines on various tasks.
Researcher Affiliation	Collaboration	FAIR, Meta Universitat Pompeu Fabra
Pseudocode	No	The paper contains diagrams illustrating steps (e.g., Figure 2) but no formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the Toolformer methodology.
Open Datasets	Yes	We use a subset of CCNet (Wenzek et al., 2020) as our dataset C and GPT-J (Wang and Komatsuzaki, 2021) as our language model M. ... We evaluate our models on two language modeling datasets: Wiki Text (Merity et al., 2017) and a subset of 10,000 randomly selected documents from CCNet (Wenzek et al., 2020) that were not used during training.
Dataset Splits	Yes	We evaluate our models on two language modeling datasets: Wiki Text (Merity et al., 2017) and a subset of 10,000 randomly selected documents from CCNet (Wenzek et al., 2020) that were not used during training.
Hardware Specification	No	The paper does not specify the hardware used for running experiments (e.g., specific GPU models, CPU, or memory).
Software Dependencies	No	The paper mentions models like GPT-J and NLLB and tools like BM25, but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We finetune M on C using a batch size of 128 and a learning rate of 1 × 10−5 with linear warmup for the first 10% of training. Finetuning details are given in Appendix B. ... We finetune all models for 100k training steps with a batch size of 128 and a linear learning rate schedule with warmup for the first 10% of training and a maximum learning rate of 1 × 10−5.