Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ELIAS: End-to-End Learning to Index and Search in Large Output Spaces
Authors: Nilesh Gupta, Patrick Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit Dhillon
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. Table 1: Performance comparison on extreme classification benchmark datasets. |
| Researcher Affiliation | Collaboration | Nilesh Gupta UT Austin EMAIL Patrick H. Chen UCLA EMAIL Hsiang-Fu, Yu Amazon EMAIL Cho-Jui, Hsieh UCLA EMAIL Inderjit S. Dhillon UT Austin & Google EMAIL |
| Pseudocode | No | The paper describes the method in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS. |
| Open Datasets | Yes | We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. For LF-Amazon Titles-131K, we use the experimental setup provided in the extreme classification repository [5]. (Reference [5]: K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. The extreme classification repository: Multi-label datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.) |
| Dataset Splits | Yes | For Wikipedia-500K, Amazon-670K, and Amazon-3M, we use the same experimental setup (i.e. raw input text, sparse features and train-test split) as existing deep XMC methods [31, 33, 18, 7]. The score calibration module is learned on a small validation set of 5,000 points. |
| Hardware Specification | No | The paper mentions that ELIAS 'can be efficiently implemented on modern GPUs' and achieves 'sub-millisecond prediction latency on a dataset with 3 million labels on a single GPU', but it does not specify the exact model or specifications of the GPU or any other hardware components. |
| Software Dependencies | No | The paper mentions software like PyTorch, BERT, LIBLINEAR, and scikit-learn, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Number of clusters C for each dataset is chosen to be the same as Light XML which selects C L/100. We keep the shortlist size hyperparameter K fixed to 2000 which is approximately same as the number of labels existing partition based methods shortlist assuming beam-size b = 20 and the number of labels per cluster = 100. Adam W [20] optimizer is used to train the whole model with weight decay applied only to non-gain and non-bias parameters. Optimization update for label classifiers WL is performed with high accumulation steps (i.e. optimization update is performed at every k training steps, where k = 10). |