reproducibilityindex.ai

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

Authors: Neel Guha, Mayee Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks.
Researcher Affiliation	Collaboration	Neel Guha* Stanford University Mayee F. Chen* Stanford University Kush Bhatia* Stanford University Azalia Mirhoseini Anthropic Frederic Sala University of Wisconsin-Madison Christopher Ré Stanford University
Pseudocode	Yes	Algorithm 1 EMBROID: Correcting LLMs with embeddings
Open Source Code	No	The paper mentions using the Manifest library [49] (a third-party tool) but does not provide specific access to the source code for the Embroid methodology described in the paper.
Open Datasets	Yes	We consider a collection of 95 class-balanced sentence classification tasks, derived from binarizing existing multi-class legal, scientific, and general domain classification benchmarks like CUAD, AGNews, DBpedia-14, Few Rel, and several others [21, 25, 30, 67, 69].
Dataset Splits	No	The paper uses an 'unlabeled dataset D' and operates in a 'true few-shot regime' where 'the only labels available are those used in the prompt.' It does not specify typical training, validation, or test splits for the main datasets on which EMBROID is applied, as its method primarily aims to correct LM predictions on unlabeled data.
Hardware Specification	Yes	Inference for open source models (OPT, GPT-JT, and Bloom) were run using the Manifest library [49] on 40GB A100 NVIDIA GPU machines.
Software Dependencies	No	The paper mentions using HELM API, Manifest library, and scikit-learn library, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	EMBROID was run with k = 10, τ + i = P(λi = 1), and τ i = P(λi = 1).