ST$_k$: A Scalable Module for Solving Top-k Problems

Authors: Hanchen Xia, Weidong Liu, Xiaojun Mao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply STk to the Average Top-k Loss (ATk), which inherently faces a Top-k problem. The proposed STk Loss outperforms ATk Loss and achieves the best average performance on multiple benchmarks, with the lowest standard deviation. With the assistance of STk Loss, we surpass the state-of-the-art (SOTA) on both CIFAR-100-LT and Places-LT leaderboards.
Researcher Affiliation Collaboration { School of Mathematical Sciences, Ministry of Education Key Lab of Artificial Intelligence, Ministry of Education Key Laboratory of Scientific and Engineering Computing} Shanghai Jiao Tong University, Shanghai, China Royal Flush AI Research Institute, Hangzhou, China
Pseudocode No The paper presents algorithmic steps (e.g., BCD-STk, SGD-STk) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block or figure.
Open Source Code No Neur IPS Paper Checklist, Question 4 Justification: Every experiment can be easily reproduced, and the code will be available soon.
Open Datasets Yes We select binary classification datasets from the KEEL2 and UCI3 databases; see Table 9. For the Housing, Abalone, and Cpusmall datasets, we normalize the output to between [0, 1]; for the Sinc dataset, we first randomly sample 1000 points (xi, yi) from the function.
Dataset Splits Yes The number of samples is 10,000 for the training set and 2,500 each for the validation set and the test set. We divided the datasets into training, validation, and test sets in a 0.5 : 0.25 : 0.25 ratio.
Hardware Specification Yes Visual classification experiments can be implemented on a single L20 GPU, with varying durations ranging from 20 to 200 minutes. Training these Transformer-based models would take about 10 hours on a single RTX 4090. The computational resource requirements of experiments on large real-world datasets can be found in 5.2, while the remaining experiments can be produced on CPU.
Software Dependencies No The paper mentions using 'Open NMT' and 'FAIR-Seq' for translation tasks but does not specify their version numbers or versions for other key software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes We use Adam as our optimizer, setting the batch size to 512, while keeping the other hyperparameters as default. The hyperparameters in the experiment include k in MATk and ATk, the coefficient of the regularization term C, the initial learning rate η, and the smoothing coefficient δ. The search spaces for several hyperparameters are as follows: k {1} [0.1 : 0.1 : 1]; C {100, 101, 102, 103, 104, 105}; η {0.1, 0.05, 0, 01, 0.005, 0.001}; δ {0.1, 0.01, 0.001, 0.0001}. The smoothing coefficient δ, 0.01, is a grid search determined value for all datasets in Section 5.1 (see page 7) and was adopted in all other experiments.