Noninvasive Self-attention for Side Information Fusion in Sequential Recommendation
Authors: Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, Lifeng Shang4249-4256
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the state-of-the-art models with negligible computational overheads. |
| Researcher Affiliation | Collaboration | Chang Liu1, Xiaoguang Li2 , Guohao Cai2, Zhenhua Dong2, Hong Zhu2, Lifeng Shang2 1 The University of Hong Kong, 2 Huawei Noah s Ark Lab lcon7@connect.hku.hk, {lixiaoguang11, caiguohao1, dongzhenhua, zhuhong8, shang.lifeng}@huawei.com |
| Pseudocode | No | The paper describes the model architecture and mechanisms but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Datasets We evaluated our methods on the public Movie Lens datasets and a real-world dataset called APP. Please refer to Table 2 for detailed descriptions of the datasets. |
| Dataset Splits | Yes | Following the practice of (Kang and Mc Auley 2018; Sun et al. 2019), we use the heading subsequence for each user s record without the last two elements as the training set. The second last items in the sequences are used as the validation set for tuning hyper-parameters and finding the best checkpoint. The last elements from these sequences construct the test set. |
| Hardware Specification | No | The paper mentions 'GPU acceleration' but does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For all the models in this research, we train them with Adam optimizer using a learning rate of 1e-4 for 200 epochs, with a batch size of 128. We fix the random seeds to alleviate the variation caused by randomness. The learning rate is adjusted by a linear decay scheduler with a 5% linear warm-up. We also apply grid-search to minimize the bias of our experiment results. The searching space contains three hyper-parameters, hidden size {128, 256, 512}, num heads {4, 8} and num layers {1, 2, 3, 4}. In the end we use 4 heads and 3 layers, as well as 512 hidden size for Movie Lens Datasets and 256 for the APP dataset. |