ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation

Authors: Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen, Jun Gao

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that ATRank can achieve better performance and faster training process.
Researcher Affiliation Collaboration 1Alibaba Group 2Key Laboratory of High Confidence Software Technologies, EECS, Peking University
Pseudocode No No pseudocode or algorithm blocks are found in the paper.
Open Source Code No No statement regarding open-source code availability or links to repositories is found in the paper.
Open Datasets Yes We collect several subsets of amazon product data as in (Mc Auley et al. 2015), which have already been reduced to satisfy the 5-core property, such that each of the remaining users and items have 5 reviews each 1. http://jmcauley.ucsd.edu/data/amazon/
Dataset Splits No The paper describes a training and test set split, but does not explicitly mention a separate validation set split or percentage.
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for experiments are mentioned in the paper.
Software Dependencies No The paper mentions optimizers and network types but does not provide specific software library names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Network Shape. We set the dimension size of each categorial feature embedding to be 64, and we concat these embeddings as the behavior representation in the behavior embedding space. The hidden size of all layers is set to be 128. The ranking function f is simply set to be the dot product in these tasks. ... For ATRank, we set the number of the latent semantic spaces to be 8, whose dimension sizes sum to be the same as the size of the hidden layer. Batch Size. The batch size is set to be 32 for all methods. Regularization. The l2-loss weight is set to be 5e-5. Optimizer. We use SGD as the optimizer and apply exponential decay which learning rate starts at 1.0 and decay rate is set to 0.1.