Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed Chi, Derek Cheng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products. |
| Researcher Affiliation | Industry | Benjamin Coleman* Google Deep Mind colemanben@google.com Wang-Cheng Kang* Google Deep Mind wckang@google.com Matthew Fahrbach Google Research fahrbach@google.com Ruoxi Wang Google Deep Mind ruoxi@google.com Lichan Hong Google Deep Mind lichan@google.com Ed H. Chi Google Deep Mind edchi@google.com Derek Zhiyuan Cheng Google Deep Mind zcheng@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework.3" with a footnote linking to https://github.com/tensorflow/recommenders. This is a third-party framework used, not the authors' specific implementation code for their methodology. |
| Open Datasets | Yes | Datasets Criteo is an online advertisement dataset with ~45 million examples (7 days of data). ... For Criteo and Movielens, we apply the same continuous feature transformations, train-test split, and other preprocessing steps in Wang et al. (2021). For Avazu, we use the train-test split and preprocessing steps in Song et al. (2019). |
| Dataset Splits | No | The paper mentions "train-test split" and refers to external papers for preprocessing steps, but it does not explicitly define or specify the details of a "validation" dataset split (e.g., percentages or sample counts) within its own text. |
| Hardware Specification | Yes | In several of our Unified Embedding deployments (Table 2), we use TPUv4 for training and/or inference. |
| Software Dependencies | Yes | All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework. |
| Experiment Setup | Yes | Implementation and hyperparameters All methods are implemented in Tensorflow 2.0 using the Tensor Flow Recommenders (TFRS) framework. ... we use the same embedding dimension for all features (d {39, 32, 30} for Criteo, Avazu, and Movielens, respectively). ... We run a grid search to tune all embedding algorithm hyperparameters and conduct five runs per combination. ... We trained with a batch size of 512 for 300K steps using Adam with a learning rate of 0.0002. ... For all methods, we consider sixteen logarithmically-spaced memory budgets: [0.001, 0.002, 0.005, 0.007, 0.01, 0.02, 0.05, 0.07, 0.1, 0.2, 0.5, 0.7, 1.0, 2.0, 5.0, 10.0] times the memory required for the collisionless table. |