reproducibilityindex.ai

ELIAS: End-to-End Learning to Index and Search in Large Output Spaces

Authors: Nilesh Gupta, Patrick Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit Dhillon

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. Table 1: Performance comparison on extreme classification benchmark datasets.
Researcher Affiliation	Collaboration	Nilesh Gupta UT Austin nilesh@cs.utexas.edu Patrick H. Chen UCLA patrickchen@g.ucla.edu Hsiang-Fu, Yu Amazon rofu.yu@gmail.com Cho-Jui, Hsieh UCLA chohsieh@cs.ucla.edu Inderjit S. Dhillon UT Austin & Google inderjit@cs.utexas.edu
Pseudocode	No	The paper describes the method in text but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS.
Open Datasets	Yes	We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. For LF-Amazon Titles-131K, we use the experimental setup provided in the extreme classification repository [5]. (Reference [5]: K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. The extreme classification repository: Multi-label datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.)
Dataset Splits	Yes	For Wikipedia-500K, Amazon-670K, and Amazon-3M, we use the same experimental setup (i.e. raw input text, sparse features and train-test split) as existing deep XMC methods [31, 33, 18, 7]. The score calibration module is learned on a small validation set of 5,000 points.
Hardware Specification	No	The paper mentions that ELIAS 'can be efficiently implemented on modern GPUs' and achieves 'sub-millisecond prediction latency on a dataset with 3 million labels on a single GPU', but it does not specify the exact model or specifications of the GPU or any other hardware components.
Software Dependencies	No	The paper mentions software like PyTorch, BERT, LIBLINEAR, and scikit-learn, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Number of clusters C for each dataset is chosen to be the same as Light XML which selects C L/100. We keep the shortlist size hyperparameter K fixed to 2000 which is approximately same as the number of labels existing partition based methods shortlist assuming beam-size b = 20 and the number of labels per cluster = 100. Adam W [20] optimizer is used to train the whole model with weight decay applied only to non-gain and non-bias parameters. Optimization update for label classifiers WL is performed with high accumulation steps (i.e. optimization update is performed at every k training steps, where k = 10).