reproducibilityindex.ai

Pre-trained Large Language Models Use Fourier Features to Compute Addition

Authors: Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Unless otherwise stated, all experiments focus on the pre-trained GPT-2-XL model that has been fine-tuned on our addition dataset.
Researcher Affiliation	Academia	Department of Computer Science University of Southern California Los Angeles, CA 90089 {tzhou029,deqingfu,vsharan,robinjia}@usc.edu
Pseudocode	No	No pseudocode or algorithm block is provided. The paper includes formal definitions and mathematical descriptions of concepts like Fourier Basis and DFT, but these are not structured as pseudocode or algorithms.
Open Source Code	No	The paper states: "The goal of this paper is to understand how LLMs compute addition. We believe the code is not central to our contribution." and lists existing open-source models (GPT-2, GPT-J, Phi2) that they used, but does not provide their own implementation code for the described methodology.
Open Datasets	No	The paper states: "We constructed a synthetic addition dataset for fine-tuning and evaluation purposes." While the details of the dataset construction and splits (training 80%, validation 10%, test 10%) are provided, there is no link, DOI, or repository provided for public access to this constructed dataset.
Dataset Splits	Yes	The dataset is shuffled and then split into training (80%), validation (10%), and test (10%) sets.
Hardware Specification	Yes	All experiments involving fine-tuning and training from scratch in this paper were conducted on one NVIDIA A6000 GPU with 48GB of video memory.
Software Dependencies	No	The paper mentions using Huggingface for model checkpoints (e.g., GPT-2-XL, GPT-J, Phi2) but does not provide specific version numbers for general software dependencies like Python, PyTorch, or other libraries used for running the experiments.
Experiment Setup	Yes	We finetune GPT-2-XL on the language-math-dataset with 50 epochs and a batch size of 16. The dataset consists of 27, 400 training samples, 3, 420 validation samples, and 3, 420 test samples. We use the Adam W optimizer, scheduling the learning rate linearly from 1 10 5 to 0 without warmup.