Adaptive Sampling for Efficient Softmax Approximation

Authors: Tavor Baharav, Ryan Kang, Colin Sullivan, Mo Tiwari, Eric Luxenberg, David Tse, Mert Pilanci

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the sample efficiency improvements afforded by Adaptive Softmax on real and synthetic data to corroborate our theoretical results.In Section 5, we demonstrate the empirical advantages of our algorithm in several real-world applications, including in a multiclass classification setting and in large language models.
Researcher Affiliation Collaboration Tavor Z. Baharav Eric and Wendy Schmidt Center Broad Institute Cambridge, MA, 02142 baharav@broadinstitute.org, Eric Luxenberg Gridmatic Cupertino, CA 95014 eric@gridmatic.com
Pseudocode Yes Algorithm 1 Adaptive Softmax, Algorithm 2 Normalization Estimation, Algorithm 3 Best Arm Id, Algorithm 4 Adaptive Softmax (implementation details)
Open Source Code Yes All of our results are reproducible via a 1-line script, publicly available on Git Hub at github.com/Thrun Group/adaptive Softmax.
Open Datasets Yes The MNIST dataset, containing black and white images of handwritten digits as input and ten output classes representing all ten possible digits., The Euro SAT dataset, containing RGB satellite imagery as input and ten output classes, representing possible land types (e.g., river, residential, etc), Our task is task-generation, and we generate our queries x by using two datasets (Wikitext and Penn Treebank) with a sliding window of certain stride.
Dataset Splits No The paper mentions training models and evaluating on a 'test set', and tuning parameters on 'initial training data', but it does not provide specific details on training/validation/test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification No The paper discusses hardware implications and optimization strategies (e.g., 'SRAM on a GPU', 'tiling of our matrix') but does not specify the exact GPU/CPU models or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Hugging Face s Auto Model For Causal LM module' but does not specify its version number or any other software dependencies with version details.
Experiment Setup Yes For the MNIST dataset, we train a shallow CNN from scratch with two convolutional blocks (Conv2d, Re Lu, Max Pool, Batch Norm)., constant multiples were applied to variance estimate within Algorithm 3 and Algorithm 2., Tuning is performed, generally, via bisection to discover the minimal factor which still satisfies our provided failure probability parameter δ.