reproducibilityindex.ai

OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification

Authors: Shikhar Mohan, Deepak Saini, Anshul Mittal, Sayak Ray Chowdhury, Bhawna Paliwal, Jian Jiao, Manish Gupta, Manik Varma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the proposed OAK method for the Auxiliary Data enhanced XC task in three ways. Firstly, through comparisons with other leading methods which employ different ways to leverage auxiliary data we demonstrate the superiority of OAK s design choices. Secondly, via ablations we detail how each component of our architecture is crucial to OAK s performance. Thirdly, we analyse our method s performance on tail data – rare documents and rare labels. Table 3. Results on public benchmark datasets. OAK offers 5% higher P@1 on standard XC benchmark datasets.
Researcher Affiliation	Industry	1Microsoft, India 2Microsoft, USA 3Microsoft Research, India.
Pseudocode	Yes	Algorithm 1 Augmentation Module Training
Open Source Code	No	The code will be released publicly upon acceptance of this paper.
Open Datasets	Yes	The Wikipedia datasets are created from publicly available Wikipedia dumps1. 1https://dumps.wikimedia.org/enwiki/ 20220520/
Dataset Splits	No	Table 2 shows summary of dataset statistics. Dataset # Train Docs # Labels (L) # Test Docs Avg. Docs/label Avg. labels/Doc AK Types # AKPs (M) Avg. AKPs/Doc (Does not mention validation split).
Hardware Specification	Yes	We train this model for 300 epochs on 2x NVidia A100-80GB GPUs for all datasets
Software Dependencies	No	The paper mentions using a 'Distil BERT-base encoder', 'Adam W', and 'Sparse Adam' (with a PyTorch link), but does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	We train this model for 300 epochs on 2x NVidia A100-80GB GPUs for all datasets, with a batch size of 1024 and a linear LR scheduler with warmup.