reproducibilityindex.ai

PSL: Rethinking and Improving Softmax Loss from Pairwise Perspective for Recommendation

Authors: Weiqin Yang, Jiawei Chen, Xin Xin, Sheng Zhou, Binbin Hu, Yan Feng, Chun Chen, Can Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further validate the effectiveness and robustness of PSL through empirical experiments. The code is available at https://github.com/Tiny-Snow/IR-Benchmark. To empirically validate these advantages, we implement PSL with typical surrogate activations (Tanh, Atan, Re LU) and conduct extensive experiments on four real-world datasets across three experimental settings: 1) IID setting [22] where training and test distributions are identically distributed [23]; 2) OOD setting [24] with distribution shifts in item popularity; 3) Noise setting [15] with a certain ratio of false negatives. Experimental results demonstrate the superiority of PSL over existing losses in terms of recommendation accuracy, OOD robustness, and noise resistance.
Researcher Affiliation	Collaboration	Weiqin Yang Zhejiang University tinysnow@zju.edu.cn Jiawei Chen Zhejiang University sleepyhunt@zju.edu.cn Xin Xin Shandong University xinxin@sdu.edu.cn Sheng Zhou Zhejiang University zhousheng_zju@zju.edu.cn Binbin Hu Ant Group bin.hbb@antfin.com Yan Feng Zhejiang University fengyan@zju.edu.cn Chun Chen Zhejiang University chenc@zju.edu.cn Can Wang Zhejiang University wcan@zju.edu.cn
Pseudocode	No	The paper describes its methods using mathematical equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Tiny-Snow/IR-Benchmark.
Open Datasets	Yes	Four widely-used datasets including Amazon-Book, Amazon-Electronic, Amazon-Movie [40, 41], and Gowalla [42] are used in our experiments. Considering the item popularity is not heavily skewed in the Amazon-Book and Amazon-Movie datasets, we turn to other conventional datasets, Amazon-CD [40, 41] and Yelp2018 [43], as replacements for OOD testing.
Dataset Splits	Yes	All datasets are split into 80% training set and 20% test set, with 10% of the training set further treated as the validation set.
Hardware Specification	Yes	All experiments are conducted on one NVIDIA Ge Force RTX 4090 GPU and one AMD EPYC 7763 64-Core Processor.
Software Dependencies	No	The paper mentions using the 'Adam [68] optimizer' but does not specify version numbers for any software dependencies, programming languages, or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	A grid search is utilized to find the optimal hyperparameters. For all compared methods, we closely refer to the configurations provided in their respective publications to ensure their optimal performance. The hyperparameter settings are provided in Appendix B.5, where the detailed optimal hyperparameters for each method on each dataset and backbone are reported. The learning rate (lr) is searched in {10-1, 10-2, 10-3}..., The weight decay (wd) is searched in {0, 10-4, 10-5, 10-6}. The batch size is set as 1024, and the number of epochs is set as 200. Following the negative sampling strategy in Wu et al. [15], we uniformly sample 1000 negative items for each positive instance in training.