Deep Learning for Algorithm Portfolios
Authors: Andrea Loreggia, Yuri Malitsky, Horst Samulowitz, Vijay Saraswat
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the presented approach completely automates the algorithm selection pipeline and is able to achieve significantly better performance than a single best solver across multiple problem domains. The proposed technique was applied across multiple datasets in SAT and CSP domains. We first benchmark the datasets by showing the performance of a state-of-the-art al- gorithm selection strategy, CSHC (Malitsky et al. 2013), utilizing the established set of features. We subsequently evaluate the performance of the deep neural network to predict the best solver, as well as the quality of the output vector of such a network to act as a new feature vector for an existing selection strategy. For each dataset we performed the prediction task using a 10-fold cross validation approach. Tables 1 and 2 summarize our empirical results. |
| Researcher Affiliation | Collaboration | Andrea Loreggia University of Padova IBM Research, NY, USA andrea.loreggia@gmail.com Yuri Malitsky IBM Research, NY, USA yuri.malitsky@gmail.com Horst Samulowitz IBM Research, NY, USA samulowitz@us.ibm.com Vijay Saraswat IBM Research, NY, USA vijay@saraswat.org |
| Pseudocode | No | The paper describes the conversion process and the neural network architecture in narrative text and uses a diagram (Figure 3) for the network structure. However, it does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states, "We implemented the neural network described in Section using Python 2.7 and Lasagne 0.1." but does not provide any links to or statements about the availability of their own source code implementation for the described methodology. |
| Open Datasets | Yes | The SAT datasets are publicly available on the SAT competition website and are usually divided into the following three sub-domains: industrial, random and crafted. The CSP instances come from the CSP Competition (csp 2009) and include non-trivial instances from problem classes such as Timetabling, Frequency Assignment, Job-Shop, Open-Shop, Quasi-group, Costas Array, Golomb Ruler, Latin Square, All Interval Series, Balanced Incomplete Block Design, and many others. This set includes both small and large arity constraints and all of the global constraints used during the CSP solver competitions: all-different, element, weighted sum, and cumulative. (csp 2009) is a reference to "2009. CSP Solver Competition Benchmarks. http://www.cril. univ-artois.fr/ lecoutre/benchmarks.html." |
| Dataset Splits | Yes | For each dataset we performed the prediction task using a 10-fold cross validation approach. Hence, we first split a dataset into a training and test set. The train set is then split further into train and validation splits using a ratio of 75%/25%. |
| Hardware Specification | No | The paper vaguely mentions, "The framework allows exploitation of high performance GPUs," but does not specify any particular GPU models, CPU models, memory sizes, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | Yes | We implemented the neural network described in Section using Python 2.7 and Lasagne 0.1. Lasagne is a framework based on Theano 0.7 (Bastien et al. 2012; Bergstra et al. 2010) |
| Experiment Setup | Yes | The network uses the stochastic gradient descent (SGD) algorithm to speed-up the back-propagation and during the training phase it is updated using Nesterov momentum (Sutskever et al. 2013). The minibatch size is set to 128, learning rate is initially set to 0.03 and momentum is initially set to 0.9. Both are adjusted during the training phase with step size of 0.003 for learning rate and 0.001 for momentum. The non-linearity used is the rectify function φ(x) = max(0, x), while the output layer uses the sigmoid function φ(x) = 1/(1 + e x). Figure 3 represents the structure of the convolutional neural network implemented and used in all experiments. The figure also reports the number of filters used and their dimensions as well as the dimension for each convolutional layers and the probabilities used by the dropout layers and max-pool layers. We have limited the training of the neural network to 100 epochs. |