A Tale of Two Efficient and Informative Negative Sampling Distributions
Authors: Shabnam Daghaghi, Tharun Medini, Nicholas Meisburger, Beidi Chen, Mengnan Zhao, Anshumali Shrivastava
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show two classes of distributions where the sampling scheme is truly adaptive and provably generates negative samples in near-constant time. Our implementation in C++ on CPU is significantly superior, both in terms of wall-clock time and accuracy, compared to the most optimized Tensor Flow implementations of other popular negative sampling approaches on powerful NVIDIA V100 GPU. ... Summary of Contributions: ... 3) We provide a rigorous evaluation of our proposal with its efficient implementation against full softmax and popular approximations like sampled softmax, frequency-based sampled softmax, top-K activation softmax, and Noise Contrastive Estimation (NCE). We report the time-wise and iteration-wise precision on large datasets like Amazon670K, Wiki-325K, Amazon-Uniform, and ODP-105K. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Rice University 2Department of Computer Science, Rice University 3Department of Computer Science, Stanford University. |
| Pseudocode | Yes | Algorithm 1 Locality Sensitive Negative Sampling (LNS) ... Algorithm 5 Update Hash Tables |
| Open Source Code | Yes | 1The code available at https://github.com/RUSH-LAB/SLIDE |
| Open Datasets | Yes | We evaluate our framework and other baselines on four datasets. Amazon-670K and Wiki-325K are two multi-label datasets from extreme classification repository (Bhatia et al., 2016), ODP is a multi-class dataset which is obtained from (Choromanska & Langford, 2015), and Amazon-Uniform is a variant of Amazon-670K dataset with uniform label distribution [3.5]. |
| Dataset Splits | No | The paper provides 'Train' and 'Test' sample counts in Table 1, but does not explicitly detail a 'validation' split or its size/percentage. |
| Hardware Specification | Yes | Our experiments are performed on a single machine with 28-core and 224-thread processors. All the baselines are run on the state-of-the-art NVIDIA V100 GPUs with 32 GB memory. |
| Software Dependencies | No | The paper mentions TensorFlow and C++ implementation but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The optimizer is Adam with a learning rate of 0.0001 for all the experiments. The batch size for Amazon-670K, Wiki325K, Amazon-Uniform, and ODP is 1024, 256, 256, and 128 respectively for all the experiments. ... We use DWTA hash function ... with K=5 and L=300 for Wiki-325K, K=6 and L=400 for Amazon-670K, K=5 and L=150 for ODP, and K=6 and L=150 for Amazon-Uniform. |