A Tale of Two Efficient and Informative Negative Sampling Distributions

Authors: Shabnam Daghaghi, Tharun Medini, Nicholas Meisburger, Beidi Chen, Mengnan Zhao, Anshumali Shrivastava

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we show two classes of distributions where the sampling scheme is truly adaptive and provably generates negative samples in near-constant time. Our implementation in C++ on CPU is significantly superior, both in terms of wall-clock time and accuracy, compared to the most optimized Tensor Flow implementations of other popular negative sampling approaches on powerful NVIDIA V100 GPU. ... Summary of Contributions: ... 3) We provide a rigorous evaluation of our proposal with its efficient implementation against full softmax and popular approximations like sampled softmax, frequency-based sampled softmax, top-K activation softmax, and Noise Contrastive Estimation (NCE). We report the time-wise and iteration-wise precision on large datasets like Amazon670K, Wiki-325K, Amazon-Uniform, and ODP-105K.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Rice University 2Department of Computer Science, Rice University 3Department of Computer Science, Stanford University.
Pseudocode Yes Algorithm 1 Locality Sensitive Negative Sampling (LNS) ... Algorithm 5 Update Hash Tables
Open Source Code Yes 1The code available at https://github.com/RUSH-LAB/SLIDE
Open Datasets Yes We evaluate our framework and other baselines on four datasets. Amazon-670K and Wiki-325K are two multi-label datasets from extreme classification repository (Bhatia et al., 2016), ODP is a multi-class dataset which is obtained from (Choromanska & Langford, 2015), and Amazon-Uniform is a variant of Amazon-670K dataset with uniform label distribution [3.5].
Dataset Splits No The paper provides 'Train' and 'Test' sample counts in Table 1, but does not explicitly detail a 'validation' split or its size/percentage.
Hardware Specification Yes Our experiments are performed on a single machine with 28-core and 224-thread processors. All the baselines are run on the state-of-the-art NVIDIA V100 GPUs with 32 GB memory.
Software Dependencies No The paper mentions TensorFlow and C++ implementation but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes The optimizer is Adam with a learning rate of 0.0001 for all the experiments. The batch size for Amazon-670K, Wiki325K, Amazon-Uniform, and ODP is 1024, 256, 256, and 128 respectively for all the experiments. ... We use DWTA hash function ... with K=5 and L=300 for Wiki-325K, K=6 and L=400 for Amazon-670K, K=5 and L=150 for ODP, and K=6 and L=150 for Amazon-Uniform.