Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning

Authors: Xiang Ni, Jing Li, Mo Yu, Wang Zhou, Kun-Lung Wu857-864

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.
Researcher Affiliation Collaboration Citadel, IBM Research, New Jersey Institute of Technology
Pseudocode No The paper describes the model architecture and training process with equations and textual descriptions but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and data set are released at https://github.com/ xiangni/DREAM.
Open Datasets Yes We create a new benchmark with 3,150 graphs in the data set. ... Our code and data set are released at https://github.com/ xiangni/DREAM.
Dataset Splits No The paper states 'We randomly select 2,520 graphs for training and the remaining 630 graphs for testing' but does not specify a separate validation split or how it was handled.
Hardware Specification No The paper describes the simulated environment ('We create a cluster in CEPSim with 5 homogeneous devices. The computing capacity of each device is 2.5E3 million instructions per second (MIPS). The link bandwidth between devices is 1000 Mbps.') but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for training the deep reinforcement learning model itself.
Software Dependencies No The paper mentions using an 'LSTM' and 'Adam optimizer' for training, and that it 'extend[s] CEPSim' as a simulator, but it does not provide specific version numbers for any of these software components, libraries, or the CEPSim itself.
Experiment Setup Yes The number of hops K in graph embedding is 2, and the length of node embeddings is 512. The network is trained for 40 epochs using Adam optimizer with learning rate 0.001. At each training step, only one graph is fed to the network. The number of samples N for a training graph varies from 3 to 6 (with 3 on-policy samples and up to 3 samples from memory buffer). These settings are selected via cross-validation.