Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

Authors: Shauharda Khadka, Estelle Aflalo, Mattias Marder, Avrech Ben-David, Santiago Miret, Shie Mannor, Tamir Hazan, Hanlin Tang, Somdeb Majumdar

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train and validate our approach directly on the Intel NNP-I chip for inference. EGRL outperforms policy-gradient, evolutionary search and dynamic programming baselines on BERT, Res Net-101 and Res Net-50. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
Researcher Affiliation Collaboration Shauharda Khadka Intel Labs Estelle Aflalo Intel Israel Mattias Marder Intel Israel Avrech Ben-David Technion Santiago Miret Intel Labs Shie Mannor Technion Tamir Hazan Technion Hanlin Tang Intel Labs Somdeb Majumdar Intel Labs
Pseudocode Yes Algorithm 1 Agent s Interaction with the Environment and Algorithm 2 EGRL Algorithm
Open Source Code No Our code will be open-sourced.
Open Datasets Yes Workloads Tested: We benchmarked our algorithms on three popular neural network workloads. Res Net-50, with 57 nodes, is widely used for benchmarks such as MLPerf (Reddi et al., 2019). Res Net-101, with 108 nodes, allowed us to test our algorithms at greater scale. Lastly, BERT, with 376 nodes, is a state-of-the-art natural language processing model.
Dataset Splits No The paper mentions 'We train and validate our approach directly on the Intel NNP-I chip for inference' but does not provide specific details on dataset splits (percentages or counts) for validation, separate from training and testing.
Hardware Specification Yes We train and validate our approach directly on the Intel NNP-I chip for inference. We demonstrate our solution on the Intel Neural Network Processor for Inference (NNP-I), a deep learning accelerator, to map modern neural networks on one of the three memory hierarchies on the chip. NNP-I includes twelve inference compute engines (ICEs) each having a 4MB Deep SRAM. A 24MB shared memory cache (LLC) is accessible by all ICEs. Additionally, a 32 GB DRAM is also accessible to the ICEs through the LLC.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) used for the experiments.
Experiment Setup Yes A complete set of hyperparameter details can be found in Appendix B. Table 2: Hyperparameters, includes GNN hidden layer size [128], GNN depth [4], Number of GNN attention heads [4], Reward for invalid mapping [-1], Discount Rate [0.99], EA population size [20], Replay buffer size [100000], Critic learning rate [1e-3], Actor learning rate [1e-3], Alpha (Entropy Coefficient) [0.05], Batch size for PG [24], etc.