Dual-View Whitening on Pre-trained Text Embeddings for Sequential Recommendation

Authors: Lingzi Zhang, Xin Zhou, Zhiwei Zeng, Zhiqi Shen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments reveal that applying whitening to pre-trained text embeddings in sequential recommendation models significantly enhances performance. ... Experiments on three public benchmark datasets show that DWSRec outperforms stateof-the-art methods for sequential recommendation.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology.
Open Datasets Yes We conduct experiments on three categories of widely-used Amazon review dataset (Ni, Li, and Mc Auley 2019): Arts, Crafts and Sewing, Toys and Games, and Tools and Instruments.
Dataset Splits Yes We evaluate performance using the leave-one-out strategy: for each user, the last item in the interaction sequence is for testing, the second last for validation, and the rest for training.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No All models are implemented by Pytorch (Paszke et al. 2019) and Rec Bole (Zhao et al. 2021). No specific version numbers for these software dependencies are provided.
Experiment Setup Yes We standardize maximum sequence length, embedding size, and batch size at 50, 300, and 1024, respectively, and set the number of self-attention blocks, attention heads, and MLP layers in the projection head at 2. Other hyper-parameters of baseline methods are chosen as per their original papers. For our proposed methods, we tune the learning rate in {1e 5, 5e 5, 1e 4, 5e 4, 1e 3} and weight decay in {0, 1e 3, 1e 4, 1e 6}. The group number G is empirically set to 4. The number of decoupled attention-based transformer layers L is set to 2. To avoid over-fitting, we apply an early stopping strategy when N@20 on the validation data does not increase for 10 epochs.