Extreme Classification via Adversarial Softmax Approximation
Authors: Robert Bamler, Stephan Mandt
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed adversarial negative sampling method on two established benchmarks by comparing speed of convergence and predictive performance against five different baselines. Figure 1 shows our results on the Wikipedia-500K data set (left two plots) and the Amazon670K data set (right two plots). |
| Researcher Affiliation | Academia | Robert Bamler & Stephan Mandt Department of Computer Science University of California, Irvine {rbamler,mandt}@uci.edu |
| Pseudocode | No | The paper describes the algorithm in prose, but does not provide a formal pseudocode or algorithm block with a dedicated label. |
| Open Source Code | Yes | and we publish the code1 of both the main and the auxiliary model. 1https://github.com/mandt-lab/adversarial-negative-sampling |
| Open Datasets | Yes | We used the Wikipedia-500K and Amazon-670K data sets from the Extreme Classification Repository (Bhatia et al.) with K = 512-dimensional XML-CNN features (Liu et al., 2017) downloaded from (Saxena). |
| Dataset Splits | Yes | We tuned the hyperparameters for each method individually using the validation set. We split off 10% of the training set for validation, and report results on the provided test set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of an Adagrad optimizer but does not provide specific version numbers for software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | Table 1 shows the resulting hyperparameters. For the proposed method and baselines (i)-(iii) we used an Adagrad optimizer (Duchi et al., 2011) and considered learning rates ρ {0.0003, 0.001, 0.003, 0.01, 0.03} and regularizer strengths (see Eq. 6) λ {10 5, 3 10 5, . . . , 0.03}. |