Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Authors: Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train and validate our approach directly on the Intel NNP-I chip for inference. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, Res Net-101 and Res Net-50. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads. |
| Researcher Affiliation | Collaboration | Shauharda Khadka Intel Labs Estelle Aflalo Intel Israel Mattias Marder Intel Israel Avrech Ben-David Technion Santiago Miret Intel Labs Shie Mannor Technion Tamir Hazan Technion Hanlin Tang Intel Labs Somdeb Majumdar Intel Labs |
| Pseudocode | Yes | Algorithm 1 Agent s Interaction with the Environment and Algorithm 2 EGRL Algorithm |
| Open Source Code | No | Our code will be open-sourced. |
| Open Datasets | Yes | Workloads Tested: We benchmarked our algorithms on three popular neural network workloads. Res Net-50, with 57 nodes, is widely used for benchmarks such as MLPerf (Reddi et al., 2019). Res Net-101, with 108 nodes, allowed us to test our algorithms at greater scale. Lastly, BERT, with 376 nodes, is a state-of-the-art natural language processing model. |
| Dataset Splits | No | The paper mentions 'We train and validate our approach directly on the Intel NNP-I chip for inference' but does not provide specific details on dataset splits (percentages or counts) for validation, separate from training and testing. |
| Hardware Specification | Yes | We train and validate our approach directly on the Intel NNP-I chip for inference. We demonstrate our solution on the Intel Neural Network Processor for Inference (NNP-I), a deep learning accelerator, to map modern neural networks on one of the three memory hierarchies on the chip. NNP-I includes twelve inference compute engines (ICEs) each having a 4MB Deep SRAM. A 24MB shared memory cache (LLC) is accessible by all ICEs. Additionally, a 32 GB DRAM is also accessible to the ICEs through the LLC. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) used for the experiments. |
| Experiment Setup | Yes | A complete set of hyperparameter details can be found in Appendix B. Table 2: Hyperparameters, includes GNN hidden layer size [128], GNN depth [4], Number of GNN attention heads [4], Reward for invalid mapping [-1], Discount Rate [0.99], EA population size [20], Replay buffer size [100000], Critic learning rate [1e-3], Actor learning rate [1e-3], Alpha (Entropy Coefficient) [0.05], Batch size for PG [24], etc. |