Wukong: Towards a Scaling Law for Large-Scale Recommendation

Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Shen Li, Yanli Zhao, Yuchen Hao, Yantao Yao, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong s scalability on an internal, large-scale dataset.
Researcher Affiliation Industry 1Meta AI. Correspondence to: Buyun Zhang <buyunz@meta.com>, Liang Luo <liangluo@meta.com>, Yuxin Chen <yuxinc@meta.com>.
Pseudocode No The paper describes operations using mathematical formulas and text, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct link to source code or an explicit statement that the code is being released for the described methodology.
Open Datasets Yes Frappe (Baltrunas) is an app usage log. This datasets predicts whether a user uses the app with the given contexts. Micro Video (Chen et al., 2018) is a content understanding-based dataset provided by THACIL work containing interactions between users and micro-videos. Movie Lens Latest (Harper & Konstan, 2015) is a well known dataset that contains users ratings on movies. Kuai Video (Kuaishou) is the competition dataset released by Kuaishou. Taobao Ads (Tianchi, 2018) This dataset includes 8 days of ads click through rate (CTR) prediction on Taobao. Criteo Terabyte (Criteo) This dataset contains 24 days of ads click feedback. We used the last day of data for testing." and "Criteo. Criteo 1tb click logs dataset. https://ailab.criteo.com/ download-criteo-1tb-click-logs-dataset/.
Dataset Splits No The paper mentions using the last day of data for testing for Criteo and online training for the internal dataset. It refers to using the BARS benchmark preproc, which implies standard splits, but does not explicitly state the train/validation/test percentages or sample counts for reproducibility.
Hardware Specification Yes Each experiment was run on 128 or 256 H100 GPUs depending on the model size.
Software Dependencies No The paper mentions software components like Neo, Neuro Shard, and FSDP, and data types like FP16, BF16, and FP32, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes We used the best optimizer configuration found in our pilot study across all experiments, i.e., Adam with lr=0.04 with beta1=0.9, beta2=1 for dense part and Rowwise Adagrad with lr=0.04 for sparse embedding tables. Models were trained and evaluated in an online training manner. We fix the embedding dimension to 160 across all runs. We use a global batch size of 262,144 for all experiments.