Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits
Authors: Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show the proposed algorithms are able to improve or match adaptive algorithms on benchmark recommendation tasks and a large-scale industrial recommendation system, closing the performance gap between SGD and adaptive algorithms, while using significantly lower memory. |
| Researcher Affiliation | Collaboration | Yan Li ISyE, Georgia Tech yli939@gatech.edu Dhruv Choudhary Meta choudharydhruv@fb.com Xiaohan Wei Meta ubimeteor@fb.com Baichuan Yuan Meta bcyuan@fb.com Bhargav Bhushanam Meta bbhushanam@fb.com Tuo Zhao ISyE, Georgia Tech tourzhao@gatech.edu Guanghui Lan ISyE, Georgia Tech george.lan@isye.gatech.edu |
| Pseudocode | Yes | Algorithm 1 Frequency-aware Stochastic Gradient Descent |
| Open Source Code | No | The paper states "We build upon torchfm2, which contains implementation of various popular recommendation models." in Section A.1, but does not provide a specific link or statement about making their own implementation's source code publicly available. |
| Open Datasets | Yes | Datasets: Movie Lens-1M (Group Lens, 2003) and Criteo 1TB Click Logs dataset (Criteo, 2014). |
| Dataset Splits | Yes | For both Movielens-1M and Criteo dataset, we random split into training set, validation set and test set, taking up 80%, 10%, and 10% of the total samples respectively. |
| Hardware Specification | No | The paper mentions training an "ultra-large industrial recommendation model" and discusses its "size over multiple terabytes" and "memory footprint", but it does not specify any exact hardware components like GPU or CPU models, memory sizes, or specific cloud instances used for the experiments. |
| Software Dependencies | No | The paper states "We build upon torchfm2" and implies the use of deep learning frameworks, but it does not provide specific version numbers for any software dependencies (e.g., Python version, PyTorch version, or torchfm2 version). |
| Experiment Setup | Yes | To ensure a fair comparison, for each dataset and model type, we carefully tune the learning rate of each algorithm for best performance. We apply early stopping and stop training whenever the validation AUC do not increase for 2 consecutive epochs, which is widely adopted in practice (Takacs et al., 2009; Dacrema et al., 2021). All the algorithms use 1024 as the batch size during training. Tables 2 and 3 list learning rates for Movielens-1M and Criteo datasets, respectively. |