Scaling Law for Recommendation Models: Towards General-Purpose User Representations

Authors: Kyuyong Shin, Hanock Kwak, Su Young Kim, Max Nihlén Ramström, Jisu Jeong, Jung-Woo Ha, Kyung-Min Kim

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively evaluate the pretrained user representation of CLUE with multiple downstream tasks from industrial and benchmark datasets, including an online CTR evaluation. More specifically, we compare the performance of a simple multi-layer perceptron (MLP) employing our task-agnostic pretrained CLUE features with a task-specific model trained for each downstream task. Furthermore, we investigate the empirical scaling laws of training data size, model capacity, sequence length and batch size with extensive experiments, and analyze power-law scaling for training performance as a function of computing resources.
Researcher Affiliation Industry 1NAVER 2NAVER AI Lab {ky.shin, hanock.kwak2}@navercorp.com
Pseudocode No The paper describes the model architecture and mathematical formulations (e.g., equations 1-4) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes Benchmark dataset. We select two categories Books and Clothing Shoes and Jewelry from Amazon review dataset (Ni, Li, and Mc Auley 2019).
Dataset Splits No The paper states, "We make sure there are no shared users between the training, validation, and test sets," indicating the use of a validation set. However, it does not provide specific details on the split percentages or sample counts for this validation set, nor does it specify the splitting methodology.
Hardware Specification No The paper acknowledges the "NAVER Smart Machine Learning (NSML) platform team... for their critical work on the software and hardware infrastructure on which all the experiments were performed," but it does not specify any particular hardware components such as GPU models, CPU models, or memory details.
Software Dependencies No The paper mentions a "software infrastructure" provided by the NSML platform team in the acknowledgments, but it does not list any specific software dependencies (e.g., libraries, frameworks) with version numbers that would be required for replication.
Experiment Setup Yes The training details and hyperparameters of best CLUE are described in Appendix B. We train all models for 100,000 steps. CLUE is trained with 160M parameters, sequence length (128), and batch size (256).