SOLAR: Sparse Orthogonal Learned and Random Embeddings
Authors: Tharun Medini, Beidi Chen, Anshumali Shrivastava
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve superior precision and recall compared to the respective state-of-the-art baselines for each task with up to 10 faster speed. 1 INTRODUCTION Embedding models have been the mainstay algorithms for several machine learning applications like Information Retrieval (IR) (8; 2) and Natural Language Processing (NLP) (21; 16; 31; 9) in the last decade. ... 5 EXPERIMENTS We now validate our method on two main tasks 1) Product to Product Recommendation on a 1.67M book dataset. ... 2) Extreme Classification with the three largest public datasets. ... Table 1: Comparison of SOLAR against DSSM, DSSM+GLa S, and SNRM baselines. SOLAR s metrics are better than the industry-standard DSSM model while training 10x faster and evaluating 2x faster (SOLAR-CPU vs DSSM-GPU evaluation). |
| Researcher Affiliation | Academia | 1Rice University, 2Stanford University tharun.medini@rice.edu, beidi.chen@stanford.edu, anshumali@rice.edu |
| Pseudocode | No | The paper describes the workflow and procedures in detailed text and figures but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Dataset: This dataset is curated from the raw Amazon Book dataset on extreme classification repository (XML-Repo) (30). |
| Dataset Splits | Yes | This dataset comprises of 1604777 training books whose titles serve as input queries. ... There are additionally 693K eval books. |
| Hardware Specification | Yes | We train with Tensorflow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS. |
| Software Dependencies | Yes | We train with Tensorflow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS. We use TF Records data streaming to reduce GPU idle time. During training, we use a batch size of 1000. During inference, except getting the sparsified probability scores, all other steps are performed on CPU using python s multiprocessing module with 48 threads. |
| Experiment Setup | Yes | Hyperparameters: As mentioned before, we train 480K dimensional SOLAR embeddings split into K = 16 chunks of B = 30K buckets each. The label embeddings are fixed to be exactly 16-sparse while the learned query embeddings are evaluated with 1600 non-zero indices (by choosing m = 100 top buckets). We feature hash the 763265-dimensional BOW inputs to 100K dimensions. Each independent model is a feed-forward network with an input layer with 100K nodes, one hidden layer with 4096 nodes, and an output layer with B = 30K nodes. For minimizing the information loss due to feature hashing, we choose a different random seed for each model. Note that these random seeds have to be saved for consistency during evaluation. ... During training, we use a batch size of 1000. |