Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation
Authors: Chao Chen, Haoyu Geng, Nianzu Yang, Junchi Yan, Daiyue Xue, Jianping Yu, Xiaokang Yang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance. |
| Researcher Affiliation | Collaboration | 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 3Meituan, Beijing, China. |
| Pseudocode | No | The paper describes the model architecture and equations but does not present pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not explicitly state that the authors' own code for the described methodology is open-source or provide a link for it. It only links to third-party baseline implementations. |
| Open Datasets | Yes | We use three real-world datasets which are processed in line with (Weimer et al., 2007; Liang et al., 2018; Steck, 2019): (1) Amazon Electronics1 is a user review dataset, where we binarize the explicit data and only keep users and items who have at least 5 historical records; (2) Koubei2 is a user behavior dataset, where we keep users with at least 5 records and items that have been purchased by at least 100 users; and (3) Tmall3 is a user click dataset, where we keep users who click at least 10 items and items which have been seen by at least 200 users. |
| Dataset Splits | Yes | To do so, we split the users into training/validation/test sets with the ratio 8 : 1 : 1. |
| Hardware Specification | Yes | All experiments are run on a machine with E5-2678 CPU, RTX 2080 and 188G RAM. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, only names of frameworks or libraries used for baselines (e.g., GRU module, Transformer). |
| Experiment Setup | Yes | We select the continuous time regularization parameter over {1e-4, 1e-5, 1e-6}. For the sake of fairness, the embedding size of all attentions are set to 50. ... Grid search of factor size over {50, 100, ..., 300}, learning rate over {1e-4, 1e-3, 1e-2} and dropout rate over {0.1, ... ,0.5} is performed to select the best parameters. ... We finetune the model by grid search learning rate and dropout rate over {0.5, 1.0, 1.5, 2.0} and {0.1, ... ,0.5}. ... We adopt the default architecture used in its original paper where three transformer blocks are used. We further search the optimal hyper-parameters by ranging learning rate over {1e-4, 5e-4, ..., 1e-2} and dropout rate from 0.1 to 0.5. |