Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction

Authors: Shiwei Li, Huifeng Guo, Lu Hou, Wei Zhang, Xing Tang, Ruiming Tang, Rui Zhang, Ruixuan Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit width.
Researcher Affiliation Collaboration 1 Huazhong University of Science and Technology, Wuhan, China 2 Huawei Noah s Ark Lab, Shenzhen, China 3 Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1: Adaptive low-precision training
Open Source Code Yes The code of ALPT is publicly available1. 1https://gitee.com/mindspore/models/tree/master/research/ recommend/ALPT
Open Datasets Yes Dataset In this section, we conduct experiments on two real-world datasets: Criteo 3 and Avazu 4. The Criteo dataset consists of 26 categorical feature fields and 13 numerical feature fields. [...] The Avazu dataset consists of 23 categorical feature fields. [...] 3https://www.kaggle.com/c/criteo-display-adchallenge 4https://www.kaggle.com/c/avazu-ctr-prediction
Dataset Splits Yes For both datasets, we split them in a ratio of 8:1:1 randomly to get corresponding training, validation, and test sets.
Hardware Specification Yes All the experiments are run on a single Tesla V100 GPU with Intel Xeon Gold-6154 CPUs.
Software Dependencies No The paper mentions using Adam as the optimizer, but it does not specify any software versions for libraries (e.g., TensorFlow, PyTorch) or specific programming language versions, which are necessary for full reproducibility.
Experiment Setup Yes We use Adam (Kingma and Ba 2015) as the optimizer. The learning rate is set to 0.001 and the maximum number of epochs is 15. We reduce the learning rate tenfold after the 6th and 9th epochs. For regularization, we set the weight decay of embeddings to 5e 8 and 1e 5 for Avazu and Criteo, respectively. In addition, we adpot a dropout of 0.2 on MLP for Criteo. For the step size of ALPT, we set its learning rate to 2e 5 and adopt the same weight decay and learning rate decay as the embeddings.