AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

Authors: Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, ZAIXI ZHANG, Sanshi Lei Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of Adapt SSR.
Researcher Affiliation Collaboration Yang Yu1,2, Qi Liu1,2 , Kai Zhang1,2, Yuren Zhang1,2, Chao Song3, Min Hou4 Yuqing Yuan3, Zhihao Ye3, Zaixi Zhang1,2, Sanshi Lei Yu1,2 1Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence 3OPPO Research Institute 4Hefei University of Technology {yflyl613, kkzhang0808, yr160698, zaixi}@mail.ustc.edu.cn {songchao12, yuanyuqing, yezhihao3}@oppo.com qiliuql@ustc.edu.cn, {hmhoumin, meet.leiyu}@gmail.com
Pseudocode Yes Fig. 2 illustrates the framework of Adapt SSR and the pseudo-codes for the entire pre-training procedure are provided in Appendix B.
Open Source Code Yes Our code is available at https://github.com/yflyl613/Adapt SSR.
Open Datasets Yes The first dataset, the Tencent Transfer Learning (TTL) dataset, was released by Yuan et al. [52] and contains users recent 100 interactions on the QQ Browser platform.
Dataset Splits Yes For model pre-training, 90% user behavior sequences are randomly selected for training, while the rest 10% are used for validation. For each downstream task, we randomly split the dataset by 6:2:2 for training, validation, and testing.
Hardware Specification Yes We implement all experiments with Python 3.8.13 and Pytorch 1.12.1 on an NVIDIA Tesla V100 GPU.
Software Dependencies Yes We implement all experiments with Python 3.8.13 and Pytorch 1.12.1 on an NVIDIA Tesla V100 GPU.
Experiment Setup Yes Following previous works [2, 8], we set the embedding dimension d as 64. In the Transformer Encoder, the number of attention heads and layers are both set as 2. The dropout probability is set as 0.1. The data augmentation proportion ρ for each baseline method is either searched from {0.1, 0.2, . . . , 0.9} or set as the default value in the original paper if provided. The batch size and learning rate are set as 128 and 2e-4 for both pre-training and fine-tuning. More implementation details are listed in Appendix C.