Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

Authors: Neel Guha, Mayee Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks.
Researcher Affiliation Collaboration Neel Guha* Stanford University Mayee F. Chen* Stanford University Kush Bhatia* Stanford University Azalia Mirhoseini Anthropic Frederic Sala University of Wisconsin-Madison Christopher Ré Stanford University
Pseudocode Yes Algorithm 1 EMBROID: Correcting LLMs with embeddings
Open Source Code No The paper mentions using the Manifest library [49] (a third-party tool) but does not provide specific access to the source code for the Embroid methodology described in the paper.
Open Datasets Yes We consider a collection of 95 class-balanced sentence classification tasks, derived from binarizing existing multi-class legal, scientific, and general domain classification benchmarks like CUAD, AGNews, DBpedia-14, Few Rel, and several others [21, 25, 30, 67, 69].
Dataset Splits No The paper uses an 'unlabeled dataset D' and operates in a 'true few-shot regime' where 'the only labels available are those used in the prompt.' It does not specify typical training, validation, or test splits for the main datasets on which EMBROID is applied, as its method primarily aims to correct LM predictions on unlabeled data.
Hardware Specification Yes Inference for open source models (OPT, GPT-JT, and Bloom) were run using the Manifest library [49] on 40GB A100 NVIDIA GPU machines.
Software Dependencies No The paper mentions using HELM API, Manifest library, and scikit-learn library, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes EMBROID was run with k = 10, τ + i = P(λi = 1), and τ i = P(λi = 1).