PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation

Authors: Weiqin Yang, Jiawei Chen, Xin Xin, Sheng Zhou, Binbin Hu, Yan Feng, Chun Chen, Can Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further validate the effectiveness and robustness of PSL through empirical experiments. The code is available at https://github.com/Tiny-Snow/IR-Benchmark. To empirically validate these advantages, we implement PSL with typical surrogate activations (Tanh, Atan, Re LU) and conduct extensive experiments on four real-world datasets across three experimental settings: 1) IID setting [22] where training and test distributions are identically distributed [23]; 2) OOD setting [24] with distribution shifts in item popularity; 3) Noise setting [15] with a certain ratio of false negatives. Experimental results demonstrate the superiority of PSL over existing losses in terms of recommendation accuracy, OOD robustness, and noise resistance.
Researcher Affiliation Collaboration Weiqin Yang Zhejiang University tinysnow@zju.edu.cn Jiawei Chen Zhejiang University sleepyhunt@zju.edu.cn Xin Xin Shandong University xinxin@sdu.edu.cn Sheng Zhou Zhejiang University zhousheng_zju@zju.edu.cn Binbin Hu Ant Group bin.hbb@antfin.com Yan Feng Zhejiang University fengyan@zju.edu.cn Chun Chen Zhejiang University chenc@zju.edu.cn Can Wang Zhejiang University wcan@zju.edu.cn
Pseudocode No The paper describes its methods using mathematical equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Tiny-Snow/IR-Benchmark.
Open Datasets Yes Four widely-used datasets including Amazon-Book, Amazon-Electronic, Amazon-Movie [40, 41], and Gowalla [42] are used in our experiments. Considering the item popularity is not heavily skewed in the Amazon-Book and Amazon-Movie datasets, we turn to other conventional datasets, Amazon-CD [40, 41] and Yelp2018 [43], as replacements for OOD testing.
Dataset Splits Yes All datasets are split into 80% training set and 20% test set, with 10% of the training set further treated as the validation set.
Hardware Specification Yes All experiments are conducted on one NVIDIA Ge Force RTX 4090 GPU and one AMD EPYC 7763 64-Core Processor.
Software Dependencies No The paper mentions using the 'Adam [68] optimizer' but does not specify version numbers for any software dependencies, programming languages, or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes A grid search is utilized to find the optimal hyperparameters. For all compared methods, we closely refer to the configurations provided in their respective publications to ensure their optimal performance. The hyperparameter settings are provided in Appendix B.5, where the detailed optimal hyperparameters for each method on each dataset and backbone are reported. The learning rate (lr) is searched in {10-1, 10-2, 10-3}..., The weight decay (wd) is searched in {0, 10-4, 10-5, 10-6}. The batch size is set as 1024, and the number of epochs is set as 200. Following the negative sampling strategy in Wu et al. [15], we uniformly sample 1000 negative items for each positive instance in training.