Rankmax: An Adaptive Projection Alternative to the Softmax Function

Authors: Weiwei Kong, Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we studied how well Rankmax performs as a multilabel classification loss, and compared it to both Softmax and Sparsemax [26]. For evaluation, we chose a recommender system task where the goal is to learn which movies (=labels) to recommend to a user (=example). We experimented with Movielens datasets [15], namely the datasets of 100K, 20M, and 1B ratings, the latter being artificially generated from the 20M dataset [4].
Researcher Affiliation Collaboration Weiwei Kong Georgia Institute of Technology wwkong@gatech.edu Walid Krichene Google Research walidk@google.com Nicolas Mayoraz Google Research nmayoraz@google.com Steffen Rendle Google Research srendle@google.com Li Zhang Google Research liqzhang@google.com
Pseudocode Yes Finally, the index t can be computed in O(n log k), as detailed in Algorithm 1 in the supplement.
Open Source Code No No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) was provided for the methodology described in this paper.
Open Datasets Yes We experimented with Movielens datasets [15], namely the datasets of 100K, 20M, and 1B ratings, the latter being artificially generated from the 20M dataset [4]. Basic statistics about the datasets are summarized in Table 1.
Dataset Splits Yes The datasets were partitioned into 80% training, 10% cross-validation and 10% test.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments were provided.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers) were provided.
Experiment Setup Yes Hyper-parameters were tuned based on the cross-validation set. Figure 2 illustrates the evolution of AP@10 and R@100 over the course of training, for different learning rates.