Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation

Authors: Chao Chen, Haoyu Geng, Nianzu Yang, Junchi Yan, Daiyue Xue, Jianping Yu, Xiaokang Yang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
Researcher Affiliation Collaboration 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 3Meituan, Beijing, China.
Pseudocode No The paper describes the model architecture and equations but does not present pseudocode or an algorithm block.
Open Source Code No The paper does not explicitly state that the authors' own code for the described methodology is open-source or provide a link for it. It only links to third-party baseline implementations.
Open Datasets Yes We use three real-world datasets which are processed in line with (Weimer et al., 2007; Liang et al., 2018; Steck, 2019): (1) Amazon Electronics1 is a user review dataset, where we binarize the explicit data and only keep users and items who have at least 5 historical records; (2) Koubei2 is a user behavior dataset, where we keep users with at least 5 records and items that have been purchased by at least 100 users; and (3) Tmall3 is a user click dataset, where we keep users who click at least 10 items and items which have been seen by at least 200 users.
Dataset Splits Yes To do so, we split the users into training/validation/test sets with the ratio 8 : 1 : 1.
Hardware Specification Yes All experiments are run on a machine with E5-2678 CPU, RTX 2080 and 188G RAM.
Software Dependencies No The paper does not specify software dependencies with version numbers, only names of frameworks or libraries used for baselines (e.g., GRU module, Transformer).
Experiment Setup Yes We select the continuous time regularization parameter over {1e-4, 1e-5, 1e-6}. For the sake of fairness, the embedding size of all attentions are set to 50. ... Grid search of factor size over {50, 100, ..., 300}, learning rate over {1e-4, 1e-3, 1e-2} and dropout rate over {0.1, ... ,0.5} is performed to select the best parameters. ... We finetune the model by grid search learning rate and dropout rate over {0.5, 1.0, 1.5, 2.0} and {0.1, ... ,0.5}. ... We adopt the default architecture used in its original paper where three transformer blocks are used. We further search the optimal hyper-parameters by ranging learning rate over {1e-4, 5e-4, ..., 1e-2} and dropout rate from 0.1 to 0.5.