Sampled Softmax with Random Fourier Features

Authors: Ankit Singh Rawat, Jiecao Chen, Felix Xinnan X. Yu, Ananda Theertha Suresh, Sanjiv Kumar

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on widely used NLP and extreme classification datasets to demonstrate the utility of the proposed RF-softmax method (cf. Section 4).
Researcher Affiliation Industry Google Research, New York {ankitsrawat, chenjiecao, felixyu, theertha, sanjivk}@google.com
Pseudocode No The paper describes algorithmic steps in prose within Section 3.1 but does not include a formal pseudocode block or algorithm listing.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes PENNTREEBANK [31] is a popular benchmark for NLP tasks with a vocabulary of size 10, 000. We train a language model using LSTM, where the normalized output of the LSTM serves as the input embedding. BNEWS [32] is another NLP dataset. For this dataset, we select the most frequent 64, 000 words as the vocabulary. Our model architecture for BNEWS is the same as the one used for PENNTREEBANK with more parameters. We fix the embedding dimension to be d = 200 for PENNTREEBANK and d = 512 for BNEWS. Extreme classification datasets. We test the proposed method on three classification datasets with a large number of classes [33].
Dataset Splits No The paper mentions 'validation perplexity' and uses validation sets in figures (e.g., Figure 1, Figure 2, Figure 3, Figure 4) but does not explicitly state the specific sizes or percentages for the training, validation, and test splits for the datasets used.
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Figure 1: Validation perplexity for RF-softmax on PENNTREEBANK with m = 100, D = 1024 and varying values of T. Figure 2: Validation perplexity for RF-softmax on PENNTREEBANK with m = 100 and varying D. We fix the embedding dimension to be d = 200 for PENNTREEBANK and d = 512 for BNEWS. We set this parameter to 0.3 as it leads to the best performance for the FULL baseline. The best performance is attained at T = 0.5. Figure 4 shows the performance of different methods on BNEWS. Note that the performance of RF-softmax is at par with QUADRATIC when D = 2048. Furthermore, RF-softmax outperforms QUADRATIC when D = 8192.