Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns
Authors: Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on two public benchmarks and 100 million industrial data in three domains, we demonstrate that FAPAT consistently outperforms state-of-the-art methods by an average of 4.5% across various evaluation metrics (Hits, NDCG, MRR). Besides evaluating the next-item prediction, we estimate the models capabilities to capture user intents via predicting items attributes and period-item recommendations. |
| Researcher Affiliation | Collaboration | Xin Liu1 Zheng Li2 Yifan Gao2 Jingfeng Yang2 Tianyu Cao2 Zhengyang Wang2 Bing Yin2 Yangqiu Song1 1Department of Computer Science and Engineering, HKUST 2Amazon.com Inc |
| Pseudocode | No | The paper describes the FAPAT framework and its components using text and mathematical formulas, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | Code and data are availiable at https://github.com/HKUST-KnowComp/FAPAT. |
| Open Datasets | Yes | We first evaluate our method on public benchmarks. diginetica contains the browser logs and anonymized transactions, Tmall collects anonymous users shopping logs on the Tmall online website. We choose two public benchmarks for session-based recommendation evaluation: diginetica 4 is CIKM Cup 2016 that contains the browser logs and anonymized transactions; Tmall 5 comes from a competition in IJCAI-15 which collects anonymous users shopping logs on the Tmall online website. |
| Dataset Splits | Yes | We follow previous settings that split training/validation/testing data based on timestamps. For diginetica, we gather the last 8-14 days as validation, the last 7 days as testing, and remaining as training. For Tmall, we use the last 101-200 seconds as validation, the last 100 seconds as testing, and remaining as training. For our industrial E-commerce data (i.e., Beauty, Books, Electronics), we select the last 6-10 days as validation, the last 5 days as testing, and remaining as training. |
| Hardware Specification | Yes | We implement our methods and run experiments with Python and PyTorch over 8 x A100 NVIDIA GPUs. |
| Software Dependencies | No | The paper mentions "Python and PyTorch" but does not specify their version numbers or the versions of any other software dependencies used in the experiments. |
| Experiment Setup | Yes | We fix all embeddings and hidden dimensions as 100, and the batch size is searched among {100, 200, 500} for all methods. A learning scheduler with 10% linear warmup and 90% decay is associated with the Adam optimizer [14]. The initial learning rate is set as 1e-3, and the regularization weight is tuned among {1e-4, 1e-5, 1e-6}. We seek the dropout probability between two modules from {0.0, 0.2, 0.4}, but fix the attention dropout rate as 0.2. The number of attention heads is empirically set as 4. We follow the setting of GCE-GNN that the maximum one-hop neighbor number in GAT is 12. In the interest of fairness, we also set the maximum selected pattern number as 12. |