UMEC: Unified model and embedding compression for efficient recommendation systems
Authors: Jiayi Shen, Haotao Wang, Shupeng Gui, Jianchao Tan, Zhangyang Wang, Ji Liu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on public benchmarks show that our UMEC framework notably outperforms other non-integrated baseline methods.Extensive experiments performed over large-scale public benchmarks show that our method largely outperforms previous state-of-the-art input feature selection methods and model compression methods, endorsing the benefits of the proposed end-to-end optimization. |
| Researcher Affiliation | Collaboration | 1Texas A&M University, 2University of Rochester, 3University of Texas at Austin, 4Kwai Inc. asjyjya-617@tamu.edu,sgui2@ur.rochester.edu,{htwang,atlaswang}@utexas.edu, {jianchaotan,jiliu}@kuaishou.com |
| Pseudocode | Yes | Algorithm 1: Gradient-based algorithm to solve problem (5) for UMEC. |
| Open Source Code | Yes | The codes can be found at https://github.com/VITA-Group/UMEC. |
| Open Datasets | Yes | We use the Criteo AI Labs Ad Kaggle1 and Terabyte2 datasets for our experiments. The Criteo AI Labs Ad Kaggle dataset contains approximately 45 million click log samples collected over seven days. The Terabyte dataset consists of approximately 4.4 billion click log samples collected over 24 days and we perform uniform sampling with 12.5% sampling rate from the raw data following the sampling scheme in Naumov et al. (2019).1https://www.kaggle.com/c/criteo-display-ad-challenge 2https://labs.criteo.com/2013/12/download-terabyte-click-logs/ |
| Dataset Splits | Yes | Following the official setting, for both datasets we split the data of the last day into validation and testing sets, and use data from the rest days as the training set. |
| Hardware Specification | Yes | We evaluate the energy cost and latency of all models on a GTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'All experiments are implemented with Py Torch' but does not specify a version number for PyTorch or any other software dependencies with version information. |
| Experiment Setup | Yes | We use SGD optimizer with learning rate 0.1 to optimize the BCE loss (ℓ(W) in our method); we set training batch size to 128, initial feature embedding dimension to 16, and use the three-layer MLP prediction model with hidden dimensions 512 and 256.We set learning rates η1, η2, η3, η4 in Algorithm 1 to be [0.1, 0.05, 0.1, 2.0] respectively, and set τ to be 306,969 which equals to 1 epoch. |