reproducibilityindex.ai

UMEC: Unified model and embedding compression for efficient recommendation systems

Authors: Jiayi Shen, Haotao Wang, Shupeng Gui, Jianchao Tan, Zhangyang Wang, Ji Liu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on public benchmarks show that our UMEC framework notably outperforms other non-integrated baseline methods.Extensive experiments performed over large-scale public benchmarks show that our method largely outperforms previous state-of-the-art input feature selection methods and model compression methods, endorsing the beneﬁts of the proposed end-to-end optimization.
Researcher Affiliation	Collaboration	1Texas A&M University, 2University of Rochester, 3University of Texas at Austin, 4Kwai Inc. asjyjya-617@tamu.edu,sgui2@ur.rochester.edu,{htwang,atlaswang}@utexas.edu, {jianchaotan,jiliu}@kuaishou.com
Pseudocode	Yes	Algorithm 1: Gradient-based algorithm to solve problem (5) for UMEC.
Open Source Code	Yes	The codes can be found at https://github.com/VITA-Group/UMEC.
Open Datasets	Yes	We use the Criteo AI Labs Ad Kaggle1 and Terabyte2 datasets for our experiments. The Criteo AI Labs Ad Kaggle dataset contains approximately 45 million click log samples collected over seven days. The Terabyte dataset consists of approximately 4.4 billion click log samples collected over 24 days and we perform uniform sampling with 12.5% sampling rate from the raw data following the sampling scheme in Naumov et al. (2019).1https://www.kaggle.com/c/criteo-display-ad-challenge 2https://labs.criteo.com/2013/12/download-terabyte-click-logs/
Dataset Splits	Yes	Following the ofﬁcial setting, for both datasets we split the data of the last day into validation and testing sets, and use data from the rest days as the training set.
Hardware Specification	Yes	We evaluate the energy cost and latency of all models on a GTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions 'All experiments are implemented with Py Torch' but does not specify a version number for PyTorch or any other software dependencies with version information.
Experiment Setup	Yes	We use SGD optimizer with learning rate 0.1 to optimize the BCE loss (ℓ(W) in our method); we set training batch size to 128, initial feature embedding dimension to 16, and use the three-layer MLP prediction model with hidden dimensions 512 and 256.We set learning rates η1, η2, η3, η4 in Algorithm 1 to be [0.1, 0.05, 0.1, 2.0] respectively, and set τ to be 306,969 which equals to 1 epoch.