reproducibilityindex.ai

Noninvasive Self-attention for Side Information Fusion in Sequential Recommendation

Authors: Chang Liu, Xiaoguang Li, Guohao Cai, Zhenhua Dong, Hong Zhu, Lifeng Shang4249-4256

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the NOVA-BERT model on both public and commercial datasets, and our method can stably outperform the state-of-the-art models with negligible computational overheads.
Researcher Affiliation	Collaboration	Chang Liu1, Xiaoguang Li2 , Guohao Cai2, Zhenhua Dong2, Hong Zhu2, Lifeng Shang2 1 The University of Hong Kong, 2 Huawei Noah s Ark Lab lcon7@connect.hku.hk, {lixiaoguang11, caiguohao1, dongzhenhua, zhuhong8, shang.lifeng}@huawei.com
Pseudocode	No	The paper describes the model architecture and mechanisms but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Datasets We evaluated our methods on the public Movie Lens datasets and a real-world dataset called APP. Please refer to Table 2 for detailed descriptions of the datasets.
Dataset Splits	Yes	Following the practice of (Kang and Mc Auley 2018; Sun et al. 2019), we use the heading subsequence for each user s record without the last two elements as the training set. The second last items in the sequences are used as the validation set for tuning hyper-parameters and ﬁnding the best checkpoint. The last elements from these sequences construct the test set.
Hardware Specification	No	The paper mentions 'GPU acceleration' but does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	For all the models in this research, we train them with Adam optimizer using a learning rate of 1e-4 for 200 epochs, with a batch size of 128. We ﬁx the random seeds to alleviate the variation caused by randomness. The learning rate is adjusted by a linear decay scheduler with a 5% linear warm-up. We also apply grid-search to minimize the bias of our experiment results. The searching space contains three hyper-parameters, hidden size {128, 256, 512}, num heads {4, 8} and num layers {1, 2, 3, 4}. In the end we use 4 heads and 3 layers, as well as 512 hidden size for Movie Lens Datasets and 256 for the APP dataset.