ELIAS: End-to-End Learning to Index and Search in Large Output Spaces
Authors: Nilesh Gupta, Patrick Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit Dhillon
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. Table 1: Performance comparison on extreme classification benchmark datasets. |
| Researcher Affiliation | Collaboration | Nilesh Gupta UT Austin nilesh@cs.utexas.edu Patrick H. Chen UCLA patrickchen@g.ucla.edu Hsiang-Fu, Yu Amazon rofu.yu@gmail.com Cho-Jui, Hsieh UCLA chohsieh@cs.ucla.edu Inderjit S. Dhillon UT Austin & Google inderjit@cs.utexas.edu |
| Pseudocode | No | The paper describes the method in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS. |
| Open Datasets | Yes | We conduct experiments on three standard full-text extreme classification datasets: Wikipedia-500K, Amazon-670K, Amazon-3M and one short-text dataset: LF-Amazon Titles-131K. For LF-Amazon Titles-131K, we use the experimental setup provided in the extreme classification repository [5]. (Reference [5]: K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. The extreme classification repository: Multi-label datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.) |
| Dataset Splits | Yes | For Wikipedia-500K, Amazon-670K, and Amazon-3M, we use the same experimental setup (i.e. raw input text, sparse features and train-test split) as existing deep XMC methods [31, 33, 18, 7]. The score calibration module is learned on a small validation set of 5,000 points. |
| Hardware Specification | No | The paper mentions that ELIAS 'can be efficiently implemented on modern GPUs' and achieves 'sub-millisecond prediction latency on a dataset with 3 million labels on a single GPU', but it does not specify the exact model or specifications of the GPU or any other hardware components. |
| Software Dependencies | No | The paper mentions software like PyTorch, BERT, LIBLINEAR, and scikit-learn, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Number of clusters C for each dataset is chosen to be the same as Light XML which selects C L/100. We keep the shortlist size hyperparameter K fixed to 2000 which is approximately same as the number of labels existing partition based methods shortlist assuming beam-size b = 20 and the number of labels per cluster = 100. Adam W [20] optimizer is used to train the whole model with weight decay applied only to non-gain and non-bias parameters. Optimization update for label classifiers WL is performed with high accumulation steps (i.e. optimization update is performed at every k training steps, where k = 10). |