Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed Chi, Derek Cheng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.
Researcher Affiliation Industry Benjamin Coleman* Google Deep Mind colemanben@google.com Wang-Cheng Kang* Google Deep Mind wckang@google.com Matthew Fahrbach Google Research fahrbach@google.com Ruoxi Wang Google Deep Mind ruoxi@google.com Lichan Hong Google Deep Mind lichan@google.com Ed H. Chi Google Deep Mind edchi@google.com Derek Zhiyuan Cheng Google Deep Mind zcheng@google.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper states: "All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework.3" with a footnote linking to https://github.com/tensorflow/recommenders. This is a third-party framework used, not the authors' specific implementation code for their methodology.
Open Datasets Yes Datasets Criteo is an online advertisement dataset with ~45 million examples (7 days of data). ... For Criteo and Movielens, we apply the same continuous feature transformations, train-test split, and other preprocessing steps in Wang et al. (2021). For Avazu, we use the train-test split and preprocessing steps in Song et al. (2019).
Dataset Splits No The paper mentions "train-test split" and refers to external papers for preprocessing steps, but it does not explicitly define or specify the details of a "validation" dataset split (e.g., percentages or sample counts) within its own text.
Hardware Specification Yes In several of our Unified Embedding deployments (Table 2), we use TPUv4 for training and/or inference.
Software Dependencies Yes All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework.
Experiment Setup Yes Implementation and hyperparameters All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework. ... we use the same embedding dimension for all features (d {39, 32, 30} for Criteo, Avazu, and Movielens, respectively). ... We run a grid search to tune all embedding algorithm hyperparameters and conduct five runs per combination. ... We trained with a batch size of 512 for 300K steps using Adam with a learning rate of 0.0002. ... For all methods, we consider sixteen logarithmically-spaced memory budgets: [0.001, 0.002, 0.005, 0.007, 0.01, 0.02, 0.05, 0.07, 0.1, 0.2, 0.5, 0.7, 1.0, 2.0, 5.0, 10.0] times the memory required for the collisionless table.