Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40 compression) as compared to existing compression baselines.
Researcher Affiliation Collaboration Google Research, Carnegie Mellon University pliang@cs.cmu.edu, {manzilzaheer,yuanwang,amra}@google.com
Pseudocode Yes Algorithm 1 ANCHOR & TRANSFORM algorithm for learning sparse representations of discrete objects. and Algorithm 2 NBANT: Nonparametric Bayesian ANT.
Open Source Code Yes Code for our experiments can be found at https://github.com/pliang279/ sparse_discrete.
Open Datasets Yes AG-News (V = 62K) (Zhang et al., 2015), DBPedia (V = 563K) (Lehmann et al., 2015), Sogou-News (V = 254K) (Zhang et al., 2015), and Yelp-review (V = 253K) (Zhang et al., 2015)... Penn Treebank (PTB) (V = 10K) (Marcus et al., 1993) and Wiki Text-103 (V = 267K) (Merity et al., 2017)... Movie Lens 25M (Harper & Konstan, 2015)... Amazon Product reviews (Ni et al., 2019).
Dataset Splits Yes To perform optimization over the number of anchors, our algorithm starts with a small A = 10 and either adds anchors (i.e., adding a new row to A and a new column to T) or deletes anchors to minimize eq (5) at every epoch depending on the trend of the objective evaluated on validation set.
Hardware Specification Yes For each epoch on Movielens 25M, standard MF takes 165s on a GTX 980 Ti GPU while ANT takes 176s for A = 5 and 180s for A = 20.
Software Dependencies No The paper mentions using “Tensor Flow and Py Torch” and “NLTK” and the “YOGI optimizer”, but does not provide specific version numbers for these software components.
Experiment Setup Yes Here we provide more details for our experiments including hyperparameters used, design decisions, and comparison with baseline methods. We also include the anonymized code in the supplementary material. and tables like Table 6: Table of hyperparameters for text classification experiments on AG-News, DBPedia, Sogou-News, and Yelp-review datasets.