DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Authors: Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, Xia Hu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Dream Shard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard. 1 Introduction ...Figure 1: Visualization of random placement, the existing best human expert strategy, and Dream Shard on a task of placing 50 tables on 4 GPUs. ...4 Experiments Our experiments aim to answer the following research questions. ...4.1 Experimental Setup Datasets. ...Baselines. ...Configurations. ...Implementation Details. ...4.2 Results and Analysis Evaluation of Dream Shard against baselines (RQ1). ...Table 1: Overall cost comparison in milliseconds...Table 2: Generalization performance of Dream Shard...Figure 5: Performance (...) of Dream Shard (...) w.r.t. the numbers of iterations (left) and running time (right). ...Table 3: Ablation study of Dream Shard. |
| Researcher Affiliation | Collaboration | Daochen Zha1 Louis Feng2 Qiaoyu Tan3 Zirui Liu1 Kwei-Herng Lai1 Bhargav Bhushanam2 Yuandong Tian2 Arun Kejariwal2 Xia Hu1 1Rice University 2Meta Platforms, Inc. 3Texas A&M University |
| Pseudocode | Yes | Algorithm 1 Training of Dream Shard |
| Open Source Code | Yes | The code is available at https://github.com/daochenzha/dreamshard. |
| Open Datasets | Yes | DLRM2 is a large-scale synthetic dataset with 856 tables, recently released by Meta. ... For reproducibility, we mainly focus on the DLRM dataset since it is open-sourced. We only report the main results on the Prod dataset for verification purposes. We provide more details in Appendix C. (Footnote 2: https://github.com/facebookresearch/dlrm_datasets) |
| Dataset Splits | Yes | To evaluate the generalizability of Dream Shard, we randomly divide the tables into a training pool Etrain and a testing pool Etest. The two pools have the same number of tables but they are not overlapped. A sharding task Ti is constructed by randomly sampling a subset of |Ei| tables from a pool, where the number of tables |Ei| {10, 20, 30, 40, 50, 60, 70, 80, 90, 100} for the DLRM dataset, and |Ei| {20, 40, 80} for the Prod dataset. For all the experiments, we randomly sample 50 training and 50 testing tasks from Etrain and Etest, respectively. |
| Hardware Specification | Yes | 2080 Ti GPUs and V100 GPUs are used for the DLRM (except that we use V100 for experiments with 8 GPUs) and Prod datasets, respectively. |
| Software Dependencies | No | The paper does not list specific version numbers for software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA versions) used to reproduce the experiments. It mentions FBGEMM as an embedding implementation, but not as a required software dependency with version for their own code. |
| Experiment Setup | Yes | We use the same hyperparameters for all the experiments with Ncollect = 10, Ncost = 300, Nbatch = 64, NRL = 10, Nepisode = 10, 10 training iterations, and an entropy weight of 0.001 in the policy gradient. |