SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification

Authors: Pranjal Aggarwal, Ameet Deshpande, Karthik R Narasimhan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop Sem Sup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Indian Institute of Technology, Delhi, India 2Department of Computer Science, Princeton University.
Pseudocode No The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and demo are available at https:// github.com/princeton-nlp/semsup-xc and https://huggingface.co/spaces/Pranjal2041/ Sem Sup-XC/.
Open Datasets Yes We evaluate our model on three diverse public datasets. They are, EURLex-4.3K (Chalkidis et al., 2019) which is legal document classification dataset with 4.3K classes, Amazon Cat-13K (Mc Auley & Leskovec, 2013) which is an e-commerce product tagging dataset including Amazon product descriptions and titles with 13K categories, and Wikipedia-1M (Gupta et al., 2021) which is an article classification dataset made up of 5 million Wikipedia articles with over 1 million categories.
Dataset Splits Yes For the EURLex dataset, we use the standard validation split for choosing the best parameters. We provide detailed statistics about the number of instances and classes in train and test set in Table 1.
Hardware Specification Yes All implementation was done in Py Torch and Huggingface transformer and experiments were run NVIDIA RTX2080 and NVIDIA RTX3090 gpus.
Software Dependencies No All implementation was done in Py Torch and Huggingface transformer and experiments were run NVIDIA RTX2080 and NVIDIA RTX3090 gpus. The paper mentions software tools but lacks specific version numbers for reproducibility.
Experiment Setup Yes We use the Adam W optimizer (Loshchilov & Hutter, 2019) and tune our hyperparameters using grid search on the respective validation set. We set the input and output encoder s learning rate at 5e 5 and 1e 4, respectively. We use the same learning rate for the other two datasets. We use batch_size of 16 on EURLex and 32 on Amazon Cat and Wikipedia. For Eurlex, we train our zero-shot model for fixed 2 epochs and the generalized zero-shot model for 10 epochs. For the other 2 datasets, we train for a fixed 1 epoch.