Noninvasive Self-attention for Side Information Fusion in Sequential Recommendation

Authors: Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, Lifeng Shang4249-4256

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the state-of-the-art models with negligible computational overheads.
Researcher Affiliation Collaboration Chang Liu1, Xiaoguang Li2 , Guohao Cai2, Zhenhua Dong2, Hong Zhu2, Lifeng Shang2 1 The University of Hong Kong, 2 Huawei Noah s Ark Lab lcon7@connect.hku.hk, {lixiaoguang11, caiguohao1, dongzhenhua, zhuhong8, shang.lifeng}@huawei.com
Pseudocode No The paper describes the model architecture and mechanisms but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Datasets We evaluated our methods on the public Movie Lens datasets and a real-world dataset called APP. Please refer to Table 2 for detailed descriptions of the datasets.
Dataset Splits Yes Following the practice of (Kang and Mc Auley 2018; Sun et al. 2019), we use the heading subsequence for each user s record without the last two elements as the training set. The second last items in the sequences are used as the validation set for tuning hyper-parameters and finding the best checkpoint. The last elements from these sequences construct the test set.
Hardware Specification No The paper mentions 'GPU acceleration' but does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup Yes For all the models in this research, we train them with Adam optimizer using a learning rate of 1e-4 for 200 epochs, with a batch size of 128. We fix the random seeds to alleviate the variation caused by randomness. The learning rate is adjusted by a linear decay scheduler with a 5% linear warm-up. We also apply grid-search to minimize the bias of our experiment results. The searching space contains three hyper-parameters, hidden size {128, 256, 512}, num heads {4, 8} and num layers {1, 2, 3, 4}. In the end we use 4 heads and 3 layers, as well as 512 hidden size for Movie Lens Datasets and 256 for the APP dataset.