ELIAS: End-to-End Learning to Index and Search in Large Output Spaces

Authors: Nilesh Gupta, Patrick Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit Dhillon

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. Table 1: Performance comparison on extreme classification benchmark datasets.
Researcher Affiliation Collaboration Nilesh Gupta UT Austin nilesh@cs.utexas.edu Patrick H. Chen UCLA patrickchen@g.ucla.edu Hsiang-Fu, Yu Amazon rofu.yu@gmail.com Cho-Jui, Hsieh UCLA chohsieh@cs.ucla.edu Inderjit S. Dhillon UT Austin & Google inderjit@cs.utexas.edu
Pseudocode No The paper describes the method in text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS.
Open Datasets Yes We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. For LF-Amazon Titles-131K, we use the experimental setup provided in the extreme classification repository [5]. (Reference [5]: K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. The extreme classification repository: Multi-label datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.)
Dataset Splits Yes For Wikipedia-500K, Amazon-670K, and Amazon-3M, we use the same experimental setup (i.e. raw input text, sparse features and train-test split) as existing deep XMC methods [31, 33, 18, 7]. The score calibration module is learned on a small validation set of 5,000 points.
Hardware Specification No The paper mentions that ELIAS 'can be efficiently implemented on modern GPUs' and achieves 'sub-millisecond prediction latency on a dataset with 3 million labels on a single GPU', but it does not specify the exact model or specifications of the GPU or any other hardware components.
Software Dependencies No The paper mentions software like PyTorch, BERT, LIBLINEAR, and scikit-learn, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Number of clusters C for each dataset is chosen to be the same as Light XML which selects C L/100. We keep the shortlist size hyperparameter K fixed to 2000 which is approximately same as the number of labels existing partition based methods shortlist assuming beam-size b = 20 and the number of labels per cluster = 100. Adam W [20] optimizer is used to train the whole model with weight decay applied only to non-gain and non-bias parameters. Optimization update for label classifiers WL is performed with high accumulation steps (i.e. optimization update is performed at every k training steps, where k = 10).