Dual-View Whitening on Pre-trained Text Embeddings for Sequential Recommendation
Authors: Lingzi Zhang, Xin Zhou, Zhiwei Zeng, Zhiqi Shen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments reveal that applying whitening to pre-trained text embeddings in sequential recommendation models significantly enhances performance. ... Experiments on three public benchmark datasets show that DWSRec outperforms stateof-the-art methods for sequential recommendation. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct experiments on three categories of widely-used Amazon review dataset (Ni, Li, and Mc Auley 2019): Arts, Crafts and Sewing, Toys and Games, and Tools and Instruments. |
| Dataset Splits | Yes | We evaluate performance using the leave-one-out strategy: for each user, the last item in the interaction sequence is for testing, the second last for validation, and the rest for training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | All models are implemented by Pytorch (Paszke et al. 2019) and Rec Bole (Zhao et al. 2021). No specific version numbers for these software dependencies are provided. |
| Experiment Setup | Yes | We standardize maximum sequence length, embedding size, and batch size at 50, 300, and 1024, respectively, and set the number of self-attention blocks, attention heads, and MLP layers in the projection head at 2. Other hyper-parameters of baseline methods are chosen as per their original papers. For our proposed methods, we tune the learning rate in {1e 5, 5e 5, 1e 4, 5e 4, 1e 3} and weight decay in {0, 1e 3, 1e 4, 1e 6}. The group number G is empirically set to 4. The number of decoupled attention-based transformer layers L is set to 2. To avoid over-fitting, we apply an early stopping strategy when N@20 on the validation data does not increase for 10 epochs. |