Asynchronous Stochastic Gradient Descent for Extreme-Scale Recommender Systems

Authors: Lewis Liu, Kun Zhao328-335

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the number of workers in asynchronous training can be extended to 3000 with guaranteed convergence, and the final AUC is improved by more than 5 percentage. In this section, we conduct experiments on the CTR model for a e-commerce search engine.
Researcher Affiliation Collaboration Lewis Liu,1, Kun Zhao2,* 1 University of Montreal, Quebec 2 Alibaba Group
Pseudocode Yes Algorithm 1: Local Batch Normalization and Algorithm 2: Adagrad-SWAP
Open Source Code No Later the implementation will be integrated into higher Tensor Flow releases. The paper states future integration into TensorFlow but does not provide a current link or explicit statement of open-sourcing its specific implementation.
Open Datasets No The training data consists of records of browsing and purchases, queries and product information on a e-commerce web site. The dataset described is internal to an e-commerce company and no access information (link, citation, or repository) is provided for public access.
Dataset Splits No The model is trained in a incremental way. i.e, it uses samples of day-1 as training set and day-2 as test set, and refines the model day by day. While a training and testing split is mentioned, explicit details for a validation split are not provided.
Hardware Specification Yes Each node in the cluster is equipped with a 64-core CPU and 512GB memory. During training, the resource of each worker (and parameter server) is limited to 10 physical cores and 20 GB memory.
Software Dependencies Yes We extend Tensor Flow V1.4 as the training framework...
Experiment Setup Yes By setting k = 2 in Equ. 4... and Adagrad-SWAP, in which we set decay rate with 0.8 and T = 1.4 107...