Extreme Classification via Adversarial Softmax Approximation

Authors: Robert Bamler, Stephan Mandt

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the proposed adversarial negative sampling method on two established benchmarks by comparing speed of convergence and predictive performance against five different baselines. Figure 1 shows our results on the Wikipedia-500K data set (left two plots) and the Amazon670K data set (right two plots).
Researcher Affiliation Academia Robert Bamler & Stephan Mandt Department of Computer Science University of California, Irvine {rbamler,mandt}@uci.edu
Pseudocode No The paper describes the algorithm in prose, but does not provide a formal pseudocode or algorithm block with a dedicated label.
Open Source Code Yes and we publish the code1 of both the main and the auxiliary model. 1https://github.com/mandt-lab/adversarial-negative-sampling
Open Datasets Yes We used the Wikipedia-500K and Amazon-670K data sets from the Extreme Classification Repository (Bhatia et al.) with K = 512-dimensional XML-CNN features (Liu et al., 2017) downloaded from (Saxena).
Dataset Splits Yes We tuned the hyperparameters for each method individually using the validation set. We split off 10% of the training set for validation, and report results on the provided test set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions the use of an Adagrad optimizer but does not provide specific version numbers for software dependencies or libraries used in the implementation.
Experiment Setup Yes Table 1 shows the resulting hyperparameters. For the proposed method and baselines (i)-(iii) we used an Adagrad optimizer (Duchi et al., 2011) and considered learning rates ρ {0.0003, 0.001, 0.003, 0.01, 0.03} and regularizer strengths (see Eq. 6) λ {10 5, 3 10 5, . . . , 0.03}.