reproducibilityindex.ai

Learning Transformer Programs

Authors: Dan Friedman, Alexander Wettig, Danqi Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our approach, we learn Transformer Programs for a variety of problems, including an in-context learning task, a suite of algorithmic problems (e.g. sorting, recognizing Dyck-languages), and NLP tasks including named entity recognition and text classification. The Transformer Programs can automatically find reasonable solutions, performing on par with standard Transformers of comparable size; and, more importantly, they are easy to interpret.
Researcher Affiliation	Academia	Dan Friedman Alexander Wettig Danqi Chen Department of Computer Science & Princeton Language and Intelligence Princeton University {dfriedman,awettig,danqic}@cs.princeton.edu
Pseudocode	No	The paper includes Python code snippets for the learned Transformer Programs, but it does not provide pseudocode or algorithm blocks for the method of learning Transformer Programs itself.
Open Source Code	Yes	Our code is available at https://github.com/princeton-nlp/Transformer Programs, along with a number of example Transformer Programs.
Open Datasets	Yes	We validate our approach by learning Transformer Programs for a variety of problems, including an in-context learning task; the set of algorithmic problems introduced by Weiss et al. [2021]; and NLP benchmarks for named entity recognition and text classification. CoNLL-2003 Named Entity Recognition task [Sang and De Meulder, 2003] using the distribution from Hugging Face Datasets [Lhoest et al., 2021].
Dataset Splits	Yes	For each RASP task, we sample 20,000 inputs without replacement and partition them into train, validation, and test sets containing 16,000/2,000/2,000 instances respectively. We use the standard train/validation/test split and evaluate the results using a Python implementation of the standard CoNLL evaluation script [Nakayama, 2018].
Hardware Specification	Yes	Each model takes between five and fifteen minutes to train on an Nvidia RTX 2080 GPU, depending on the number of layers.
Software Dependencies	No	The paper mentions implementing models in PyTorch [Paszke et al., 2019], but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup	Yes	We train each model for 250 epochs with a batch size of 512, a learning rate of 0.05, and annealing the Gumbel temperature geometrically from 3.0 to 0.01, decreasing the temperature at each training step.