reproducibilityindex.ai

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Authors: Tharun Medini, Beidi Chen, Anshumali Shrivastava

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve superior precision and recall compared to the respective state-of-the-art baselines for each task with up to 10 faster speed. 1 INTRODUCTION Embedding models have been the mainstay algorithms for several machine learning applications like Information Retrieval (IR) (8; 2) and Natural Language Processing (NLP) (21; 16; 31; 9) in the last decade. ... 5 EXPERIMENTS We now validate our method on two main tasks 1) Product to Product Recommendation on a 1.67M book dataset. ... 2) Extreme Classiﬁcation with the three largest public datasets. ... Table 1: Comparison of SOLAR against DSSM, DSSM+GLa S, and SNRM baselines. SOLAR s metrics are better than the industry-standard DSSM model while training 10x faster and evaluating 2x faster (SOLAR-CPU vs DSSM-GPU evaluation).
Researcher Affiliation	Academia	1Rice University, 2Stanford University tharun.medini@rice.edu, beidi.chen@stanford.edu, anshumali@rice.edu
Pseudocode	No	The paper describes the workflow and procedures in detailed text and figures but does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a link to a code repository for the methodology described.
Open Datasets	Yes	Dataset: This dataset is curated from the raw Amazon Book dataset on extreme classiﬁcation repository (XML-Repo) (30).
Dataset Splits	Yes	This dataset comprises of 1604777 training books whose titles serve as input queries. ... There are additionally 693K eval books.
Hardware Specification	Yes	We train with Tensorﬂow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS.
Software Dependencies	Yes	We train with Tensorﬂow (TF) v1.14 on an DGX machine with 8 NVIDIA-V100 GPUS. We use TF Records data streaming to reduce GPU idle time. During training, we use a batch size of 1000. During inference, except getting the sparsiﬁed probability scores, all other steps are performed on CPU using python s multiprocessing module with 48 threads.
Experiment Setup	Yes	Hyperparameters: As mentioned before, we train 480K dimensional SOLAR embeddings split into K = 16 chunks of B = 30K buckets each. The label embeddings are ﬁxed to be exactly 16-sparse while the learned query embeddings are evaluated with 1600 non-zero indices (by choosing m = 100 top buckets). We feature hash the 763265-dimensional BOW inputs to 100K dimensions. Each independent model is a feed-forward network with an input layer with 100K nodes, one hidden layer with 4096 nodes, and an output layer with B = 30K nodes. For minimizing the information loss due to feature hashing, we choose a different random seed for each model. Note that these random seeds have to be saved for consistency during evaluation. ... During training, we use a batch size of 1000.