reproducibilityindex.ai

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

Authors: Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show the proposed algorithms are able to improve or match adaptive algorithms on benchmark recommendation tasks and a large-scale industrial recommendation system, closing the performance gap between SGD and adaptive algorithms, while using significantly lower memory.
Researcher Affiliation	Collaboration	Yan Li ISyE, Georgia Tech yli939@gatech.edu Dhruv Choudhary Meta choudharydhruv@fb.com Xiaohan Wei Meta ubimeteor@fb.com Baichuan Yuan Meta bcyuan@fb.com Bhargav Bhushanam Meta bbhushanam@fb.com Tuo Zhao ISyE, Georgia Tech tourzhao@gatech.edu Guanghui Lan ISyE, Georgia Tech george.lan@isye.gatech.edu
Pseudocode	Yes	Algorithm 1 Frequency-aware Stochastic Gradient Descent
Open Source Code	No	The paper states "We build upon torchfm2, which contains implementation of various popular recommendation models." in Section A.1, but does not provide a specific link or statement about making their own implementation's source code publicly available.
Open Datasets	Yes	Datasets: Movie Lens-1M (Group Lens, 2003) and Criteo 1TB Click Logs dataset (Criteo, 2014).
Dataset Splits	Yes	For both Movielens-1M and Criteo dataset, we random split into training set, validation set and test set, taking up 80%, 10%, and 10% of the total samples respectively.
Hardware Specification	No	The paper mentions training an "ultra-large industrial recommendation model" and discusses its "size over multiple terabytes" and "memory footprint", but it does not specify any exact hardware components like GPU or CPU models, memory sizes, or specific cloud instances used for the experiments.
Software Dependencies	No	The paper states "We build upon torchfm2" and implies the use of deep learning frameworks, but it does not provide specific version numbers for any software dependencies (e.g., Python version, PyTorch version, or torchfm2 version).
Experiment Setup	Yes	To ensure a fair comparison, for each dataset and model type, we carefully tune the learning rate of each algorithm for best performance. We apply early stopping and stop training whenever the validation AUC do not increase for 2 consecutive epochs, which is widely adopted in practice (Takacs et al., 2009; Dacrema et al., 2021). All the algorithms use 1024 as the batch size during training. Tables 2 and 3 list learning rates for Movielens-1M and Criteo datasets, respectively.