Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Probabilistic Masked Attention Networks for Explainable Sequential Recommendation
Authors: Huiyuan Chen, Kaixiong Zhou, Zhimeng Jiang, Chin-Chia Michael Yeh, Xiaoting Li, Menghai Pan, Yan Zheng, Xia Hu, Hao Yang
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental studies on real-world benchmark datasets show that PMAN is able to improve the performance of Transformers significantly. 5 Experiment 5.1 Experimental Setup Dataset. We consider five benchmark datasets: Amazon Beauty, Amazon-Sports2, Yelp3, Movie Lens1M4, and Steam5. |
| Researcher Affiliation | Collaboration | 1Visa Research 2Rice University 3Texas A&M University |
| Pseudocode | Yes | Algorithm 1 PMAN Input: The training sequence set S, attention capacity B, embedding size d. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | Dataset. We consider five benchmark datasets: Amazon Beauty, Amazon-Sports2, Yelp3, Movie Lens1M4, and Steam5. For each dataset, we group the interactions by users, and sort their items by the timestamps ascendingly. Following [Fan et al., 2022], we adopt 5-core setting to filter out unpopular items and inactive users with fewer than five interaction records. Their statistics are listed in Table 1. |
| Dataset Splits | Yes | Following the procedure [Kang and Mc Auley, 2018; Li et al., 2020; Fan et al., 2022], we use the last item of each user s sequence for testing, the second-to-last for validation, and the remaining items for training. |
| Hardware Specification | No | The paper mentions 'with the same hardware' but does not provide specific details about the hardware used for experiments (e.g., CPU, GPU model, memory). |
| Software Dependencies | No | The paper mentions using 'Adam as optimizer' but does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The parameters for the baselines are initialized as their original settings and are then carefully tuned to obtain optimal performance. We adopt Adam as optimizer and search embedding dimension d in Eq. (2) within {32, 64, 128}, the length of item sequence n within {25, 50}. For the attention capacity B in Problem (8), we vary the ratio r in {0.3, 0.5, 0.7, 0.9}, such that B = r n2. Moreover, all of our PMANs only use single-head attention in the experiments. |