reproducibilityindex.ai

Adaptive Sampling for Efficient Softmax Approximation

Authors: Tavor Baharav, Ryan Kang, Colin Sullivan, Mo Tiwari, Eric Luxenberg, David Tse, Mert Pilanci

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the sample efficiency improvements afforded by Adaptive Softmax on real and synthetic data to corroborate our theoretical results.In Section 5, we demonstrate the empirical advantages of our algorithm in several real-world applications, including in a multiclass classification setting and in large language models.
Researcher Affiliation	Collaboration	Tavor Z. Baharav Eric and Wendy Schmidt Center Broad Institute Cambridge, MA, 02142 baharav@broadinstitute.org, Eric Luxenberg Gridmatic Cupertino, CA 95014 eric@gridmatic.com
Pseudocode	Yes	Algorithm 1 Adaptive Softmax, Algorithm 2 Normalization Estimation, Algorithm 3 Best Arm Id, Algorithm 4 Adaptive Softmax (implementation details)
Open Source Code	Yes	All of our results are reproducible via a 1-line script, publicly available on Git Hub at github.com/Thrun Group/adaptive Softmax.
Open Datasets	Yes	The MNIST dataset, containing black and white images of handwritten digits as input and ten output classes representing all ten possible digits., The Euro SAT dataset, containing RGB satellite imagery as input and ten output classes, representing possible land types (e.g., river, residential, etc), Our task is task-generation, and we generate our queries x by using two datasets (Wikitext and Penn Treebank) with a sliding window of certain stride.
Dataset Splits	No	The paper mentions training models and evaluating on a 'test set', and tuning parameters on 'initial training data', but it does not provide specific details on training/validation/test dataset splits (e.g., percentages or exact counts for each split).
Hardware Specification	No	The paper discusses hardware implications and optimization strategies (e.g., 'SRAM on a GPU', 'tiling of our matrix') but does not specify the exact GPU/CPU models or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Hugging Face s Auto Model For Causal LM module' but does not specify its version number or any other software dependencies with version details.
Experiment Setup	Yes	For the MNIST dataset, we train a shallow CNN from scratch with two convolutional blocks (Conv2d, Re Lu, Max Pool, Batch Norm)., constant multiples were applied to variance estimate within Algorithm 3 and Algorithm 2., Tuning is performed, generally, via bisection to discover the minimal factor which still satisfies our provided failure probability parameter δ.