On the Embedding Collapse when Scaling up Recommendation Models
Authors: Xingzhuo Guo, Junwei Pan, Ximei Wang, Baixu Chen, Jie Jiang, Mingsheng Long
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that this proposed design provides consistent scalability and effective collapse mitigation for various recommendation models. Code is available at this repository: https://github. com/thuml/Multi-Embedding. |
| Researcher Affiliation | Collaboration | 1School of Software, BNRist, Tsinghua University, China 2Tencent Inc, China. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at this repository: https://github. com/thuml/Multi-Embedding. |
| Open Datasets | Yes | We conduct our experiments on two datasets for recommender systems: Criteo (Jean-Baptiste Tien, 2014) and Avazu (Steve Wang, 2014), which are large and challenging benchmark datasets widely used in recommender systems. |
| Dataset Splits | Yes | For all experiments, we split the dataset into 8 : 1 : 1 for training/validation/test with random seed 0. |
| Hardware Specification | Yes | All experiments can be done with a single NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer" but does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For all experiments, we split the dataset into 8 : 1 : 1 for training/validation/test with random seed 0. We use the Adam optimizer with batch size 2048, learning rate 0.001 and weight decay 1e-6. For base size, we use embedding size 50 for NFw FM considering the pooling, and 10 for all other experiments. We find the hidden size and depth of MLP does not matters the result, and for simplicity, we set hidden size to 400 and set depth to 3 (2 hidden layers and 1 output layer) for all models. We use 4 cross layers for DCNv2 and hidden size 16 for x Deep FM. All experiments use early stopping on validation AUC with patience 3. |