An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention

Authors: Yehjin Shin, Jeongwhan Choi, Hyowon Wi, Noseong Park

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our proposed approach through extensive experiments on 6 benchmark datasets. The experimental results demonstrate that our model outperforms 7 baseline methods in terms of recommendation performance.
Researcher Affiliation Academia Yonsei University, Seoul, South Korea {yehjin.shin, jeongwhan.choi, wihyowon, noseong}@yonsei.ac.kr
Pseudocode No The paper describes its proposed model architecture and process in text and through a diagram (Figure 4), but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/yehjin-shin/BSARec.
Open Datasets Yes Datasets We evaluate our model on 6 SR datasets where the sparsity and domain varies: i,ii,iii) Amazon Beauty, Sports, Toys (Mc Auley et al. 2015), iv) Yelp, v) ML1M (Harper and Konstan 2015), and vi) Last FM.
Dataset Splits No The paper defines how items are selected for next-item prediction but does not provide specific details on the train, validation, and test dataset splits (e.g., percentages or sample counts) within the main text. It mentions data pre-processing and refers to an Appendix for best hyperparameters, implying typical splits are used, but they are not explicitly defined here.
Hardware Specification Yes Our method is implemented in Py Torch on an NVIDIA RTX 3090 with 16 GB memory.
Software Dependencies No The paper states that the method is "implemented in Py Torch" but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes We conduct experiments under the following hyperparameters: the coefficient α is in t0.1, 0.3, 0.5, 0.7, 0.9u, and c is chosen from t1, 3, 5, 7, 9u. The number of BSA blocks L is set to 2, and the number of heads in Transformer h is in t1, 2, 4u. The dimension of D is set to 64, and the maximum sequence length N is set to 50. For training, the Adam optimizer is optimized with a learning rate in {5 ˆ 10 4, 1 ˆ 10 3}, and the batch size is set to 256.