DreamShard: Generalizable Embedding Table Placement for Recommender Systems

Authors: Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Dream Shard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard. 1 Introduction ...Figure 1: Visualization of random placement, the existing best human expert strategy, and Dream Shard on a task of placing 50 tables on 4 GPUs. ...4 Experiments Our experiments aim to answer the following research questions. ...4.1 Experimental Setup Datasets. ...Baselines. ...Configurations. ...Implementation Details. ...4.2 Results and Analysis Evaluation of Dream Shard against baselines (RQ1). ...Table 1: Overall cost comparison in milliseconds...Table 2: Generalization performance of Dream Shard...Figure 5: Performance (...) of Dream Shard (...) w.r.t. the numbers of iterations (left) and running time (right). ...Table 3: Ablation study of Dream Shard.
Researcher Affiliation Collaboration Daochen Zha1 Louis Feng2 Qiaoyu Tan3 Zirui Liu1 Kwei-Herng Lai1 Bhargav Bhushanam2 Yuandong Tian2 Arun Kejariwal2 Xia Hu1 1Rice University 2Meta Platforms, Inc. 3Texas A&M University
Pseudocode Yes Algorithm 1 Training of Dream Shard
Open Source Code Yes The code is available at https://github.com/daochenzha/dreamshard.
Open Datasets Yes DLRM2 is a large-scale synthetic dataset with 856 tables, recently released by Meta. ... For reproducibility, we mainly focus on the DLRM dataset since it is open-sourced. We only report the main results on the Prod dataset for verification purposes. We provide more details in Appendix C. (Footnote 2: https://github.com/facebookresearch/dlrm_datasets)
Dataset Splits Yes To evaluate the generalizability of Dream Shard, we randomly divide the tables into a training pool Etrain and a testing pool Etest. The two pools have the same number of tables but they are not overlapped. A sharding task Ti is constructed by randomly sampling a subset of |Ei| tables from a pool, where the number of tables |Ei| {10, 20, 30, 40, 50, 60, 70, 80, 90, 100} for the DLRM dataset, and |Ei| {20, 40, 80} for the Prod dataset. For all the experiments, we randomly sample 50 training and 50 testing tasks from Etrain and Etest, respectively.
Hardware Specification Yes 2080 Ti GPUs and V100 GPUs are used for the DLRM (except that we use V100 for experiments with 8 GPUs) and Prod datasets, respectively.
Software Dependencies No The paper does not list specific version numbers for software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA versions) used to reproduce the experiments. It mentions FBGEMM as an embedding implementation, but not as a required software dependency with version for their own code.
Experiment Setup Yes We use the same hyperparameters for all the experiments with Ncollect = 10, Ncost = 300, Nbatch = 64, NRL = 10, Nepisode = 10, 10 training iterations, and an entropy weight of 0.001 in the policy gradient.