STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Authors: Liangcai Su, Junwei Pan, Ximei Wang, Xi Xiao, Shijie Quan, Xihua Chen, Jie Jiang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluation on three public MTL recommendation datasets demonstrates that STEM-Net outperforms state-of-the-art models by a substantial margin. Our code is released at https://github.com/Liangcai Su/STEM. We conduct comprehensive experiments and ablation studies on three MTL recommendation datasets and provide compelling evidence of STEM-Net s effectiveness. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate School, Tsinghua University 2Tencent Inc. sulc21@mails.tsinghua.edu.cn, xiaox@sz.tsinghua.edu.cn, {jonaspan, messiwang, justinquan, tinychen, zeus}@tencent.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released at https://github.com/Liangcai Su/STEM. |
| Open Datasets | Yes | Public Datasets. We choose three public datasets, namely Tik Tok, QK-Video (Yuan et al. 2022), and Kuai Rand1K (Yuan et al. 2022) for performance evaluation. |
| Dataset Splits | Yes | The statistics of the processed dataset is presented in Table 1. Tik Tok: #Samples 223.4M/24.8M/27.6M; QK-Video: #Samples 95.9M/12.0M/12.5M; Kuai Rand1K: #Samples 10.9M/0.39M/0.42M. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing methods 'based on Pytorch' but does not provide specific version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | We set the learning rate as {1e 3, 5e 4, 1e 4}, the batch size as 4096, and the l2 regularization factor of embedding as 1e 6. We set the dimension of the embedding to 16, and each expert/bottom is an MLP with hidden units of [512, 512, 512]. The towers and the gate networks of all methods are MLPs with hidden units of [128, 64]. The number of task-specific and shared experts is chosen from {1, 2, 4, 8}. Grid search is used to find optimal hyper-parameters for all methods. |