Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40 compression) as compared to existing compression baselines. |
| Researcher Affiliation | Collaboration | Google Research, Carnegie Mellon University pliang@cs.cmu.edu, {manzilzaheer,yuanwang,amra}@google.com |
| Pseudocode | Yes | Algorithm 1 ANCHOR & TRANSFORM algorithm for learning sparse representations of discrete objects. and Algorithm 2 NBANT: Nonparametric Bayesian ANT. |
| Open Source Code | Yes | Code for our experiments can be found at https://github.com/pliang279/ sparse_discrete. |
| Open Datasets | Yes | AG-News (V = 62K) (Zhang et al., 2015), DBPedia (V = 563K) (Lehmann et al., 2015), Sogou-News (V = 254K) (Zhang et al., 2015), and Yelp-review (V = 253K) (Zhang et al., 2015)... Penn Treebank (PTB) (V = 10K) (Marcus et al., 1993) and Wiki Text-103 (V = 267K) (Merity et al., 2017)... Movie Lens 25M (Harper & Konstan, 2015)... Amazon Product reviews (Ni et al., 2019). |
| Dataset Splits | Yes | To perform optimization over the number of anchors, our algorithm starts with a small A = 10 and either adds anchors (i.e., adding a new row to A and a new column to T) or deletes anchors to minimize eq (5) at every epoch depending on the trend of the objective evaluated on validation set. |
| Hardware Specification | Yes | For each epoch on Movielens 25M, standard MF takes 165s on a GTX 980 Ti GPU while ANT takes 176s for A = 5 and 180s for A = 20. |
| Software Dependencies | No | The paper mentions using “Tensor Flow and Py Torch” and “NLTK” and the “YOGI optimizer”, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Here we provide more details for our experiments including hyperparameters used, design decisions, and comparison with baseline methods. We also include the anonymized code in the supplementary material. and tables like Table 6: Table of hyperparameters for text classification experiments on AG-News, DBPedia, Sogou-News, and Yelp-review datasets. |