reproducibilityindex.ai

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns

Authors: Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on two public benchmarks and 100 million industrial data in three domains, we demonstrate that FAPAT consistently outperforms state-of-the-art methods by an average of 4.5% across various evaluation metrics (Hits, NDCG, MRR). Besides evaluating the next-item prediction, we estimate the models capabilities to capture user intents via predicting items attributes and period-item recommendations.
Researcher Affiliation	Collaboration	Xin Liu1 Zheng Li2 Yifan Gao2 Jingfeng Yang2 Tianyu Cao2 Zhengyang Wang2 Bing Yin2 Yangqiu Song1 1Department of Computer Science and Engineering, HKUST 2Amazon.com Inc
Pseudocode	No	The paper describes the FAPAT framework and its components using text and mathematical formulas, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code	Yes	Code and data are availiable at https://github.com/HKUST-KnowComp/FAPAT.
Open Datasets	Yes	We first evaluate our method on public benchmarks. diginetica contains the browser logs and anonymized transactions, Tmall collects anonymous users shopping logs on the Tmall online website. We choose two public benchmarks for session-based recommendation evaluation: diginetica 4 is CIKM Cup 2016 that contains the browser logs and anonymized transactions; Tmall 5 comes from a competition in IJCAI-15 which collects anonymous users shopping logs on the Tmall online website.
Dataset Splits	Yes	We follow previous settings that split training/validation/testing data based on timestamps. For diginetica, we gather the last 8-14 days as validation, the last 7 days as testing, and remaining as training. For Tmall, we use the last 101-200 seconds as validation, the last 100 seconds as testing, and remaining as training. For our industrial E-commerce data (i.e., Beauty, Books, Electronics), we select the last 6-10 days as validation, the last 5 days as testing, and remaining as training.
Hardware Specification	Yes	We implement our methods and run experiments with Python and PyTorch over 8 x A100 NVIDIA GPUs.
Software Dependencies	No	The paper mentions "Python and PyTorch" but does not specify their version numbers or the versions of any other software dependencies used in the experiments.
Experiment Setup	Yes	We fix all embeddings and hidden dimensions as 100, and the batch size is searched among {100, 200, 500} for all methods. A learning scheduler with 10% linear warmup and 90% decay is associated with the Adam optimizer [14]. The initial learning rate is set as 1e-3, and the regularization weight is tuned among {1e-4, 1e-5, 1e-6}. We seek the dropout probability between two modules from {0.0, 0.2, 0.4}, but fix the attention dropout rate as 0.2. The number of attention heads is empirically set as 4. We follow the setting of GCE-GNN that the maximum one-hop neighbor number in GAT is 12. In the interest of fairness, we also set the maximum selected pattern number as 12.