OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification
Authors: Shikhar Mohan, Deepak Saini, Anshul Mittal, Sayak Ray Chowdhury, Bhawna Paliwal, Jian Jiao, Manish Gupta, Manik Varma
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the proposed OAK method for the Auxiliary Data enhanced XC task in three ways. Firstly, through comparisons with other leading methods which employ different ways to leverage auxiliary data we demonstrate the superiority of OAK s design choices. Secondly, via ablations we detail how each component of our architecture is crucial to OAK s performance. Thirdly, we analyse our method s performance on tail data – rare documents and rare labels. Table 3. Results on public benchmark datasets. OAK offers 5% higher P@1 on standard XC benchmark datasets. |
| Researcher Affiliation | Industry | 1Microsoft, India 2Microsoft, USA 3Microsoft Research, India. |
| Pseudocode | Yes | Algorithm 1 Augmentation Module Training |
| Open Source Code | No | The code will be released publicly upon acceptance of this paper. |
| Open Datasets | Yes | The Wikipedia datasets are created from publicly available Wikipedia dumps1. 1https://dumps.wikimedia.org/enwiki/ 20220520/ |
| Dataset Splits | No | Table 2 shows summary of dataset statistics. Dataset # Train Docs # Labels (L) # Test Docs Avg. Docs/label Avg. labels/Doc AK Types # AKPs (M) Avg. AKPs/Doc (Does not mention validation split). |
| Hardware Specification | Yes | We train this model for 300 epochs on 2x NVidia A100-80GB GPUs for all datasets |
| Software Dependencies | No | The paper mentions using a 'Distil BERT-base encoder', 'Adam W', and 'Sparse Adam' (with a PyTorch link), but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | We train this model for 300 epochs on 2x NVidia A100-80GB GPUs for all datasets, with a batch size of 1024 and a linear LR scheduler with warmup. |