SOLAR: Sparse Orthogonal Learned and Random Embeddings

Authors: Tharun Medini, Beidi Chen, Anshumali Shrivastava

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve superior precision and recall compared to the respective state-of-the-art baselines for each task with up to 10 faster speed. 1 INTRODUCTION Embedding models have been the mainstay algorithms for several machine learning applications like Information Retrieval (IR) (8; 2) and Natural Language Processing (NLP) (21; 16; 31; 9) in the last decade. ... 5 EXPERIMENTS We now validate our method on two main tasks 1) Product to Product Recommendation on a 1.67M book dataset. ... 2) Extreme Classification with the three largest public datasets. ... Table 1: Comparison of SOLAR against DSSM, DSSM+GLa S, and SNRM baselines. SOLAR s metrics are better than the industry-standard DSSM model while training 10x faster and evaluating 2x faster (SOLAR-CPU vs DSSM-GPU evaluation).
Researcher Affiliation Academia 1Rice University, 2Stanford University tharun.medini@rice.edu, beidi.chen@stanford.edu, anshumali@rice.edu
Pseudocode No The paper describes the workflow and procedures in detailed text and figures but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets Yes Dataset: This dataset is curated from the raw Amazon Book dataset on extreme classification repository (XML-Repo) (30).
Dataset Splits Yes This dataset comprises of 1604777 training books whose titles serve as input queries. ... There are additionally 693K eval books.
Hardware Specification Yes We train with Tensorflow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS.
Software Dependencies Yes We train with Tensorflow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS. We use TF Records data streaming to reduce GPU idle time. During training, we use a batch size of 1000. During inference, except getting the sparsified probability scores, all other steps are performed on CPU using python s multiprocessing module with 48 threads.
Experiment Setup Yes Hyperparameters: As mentioned before, we train 480K dimensional SOLAR embeddings split into K = 16 chunks of B = 30K buckets each. The label embeddings are fixed to be exactly 16-sparse while the learned query embeddings are evaluated with 1600 non-zero indices (by choosing m = 100 top buckets). We feature hash the 763265-dimensional BOW inputs to 100K dimensions. Each independent model is a feed-forward network with an input layer with 100K nodes, one hidden layer with 4096 nodes, and an output layer with B = 30K nodes. For minimizing the information loss due to feature hashing, we choose a different random seed for each model. Note that these random seeds have to be saved for consistency during evaluation. ... During training, we use a batch size of 1000.