reproducibilityindex.ai

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Authors: Quentin Cappart, Thierry Moisan, Louis-Martin Rousseau, Isabeau Prémont-Schwarz, Andre A. Cire3677-3687

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that our solver is efﬁcient to solve three challenging problems: the traveling salesman problem with time windows, the 4-moments portfolio optimization problem, and the 0-1 knapsack problem. Results obtained show that the framework introduced outperforms the stand-alone RL and CP solutions, while being competitive with industrial solvers.
Researcher Affiliation	Collaboration	1 Ecole Polytechnique de Montréal, Montreal, Canada 2 Element AI, Montreal, Canada 3 University of Toronto Scarborough, Toronto, Canada
Pseudocode	Yes	Algorithm 1: Ba B-DQN Search Procedure. Algorithm 2: ILDS-DQN Search Procedure. Algorithm 3: RBS-PPO Search Procedure.
Open Source Code	Yes	Our detailed contributions are as follows: ... (5) The open-source release of our code and models, in order to ease the future research in this ﬁeld1. 1https://github.com/qcappart/hybrid-cp-rl-solver
Open Datasets	No	The training is done using randomly generated instances sampled from a similar distribution to those we want to solve. Such an assumption is common in the vast majority of works tackling NP-hard problems using ML (Khalil et al. 2017; Kool, Van Hoof, and Welling 2018; Cappart et al. 2019), and, despite being strong, has nonetheless a practical interest when repeatedly solving similar instances of the same problem (e.g., package shipping by retailers) The paper mentions generating instances but does not provide concrete access to a publicly available dataset or a specific citation for one.
Dataset Splits	Yes	A new model is recorded after each 100 episodes of the RL algorithm and the model achieving the best average reward on a validation set of 100 instances generated in the same way as for the training is selected. The ﬁnal evaluation is done on 100 other instances (still randomly generated in the same manner) using Intel Xeon E5-2650 CPU with 32GB of RAM and a time limit of 60 minutes.
Hardware Specification	Yes	memory consumption to 32 GB and 1 GPU (Tesla V100-SXM2-32GB) is used per model. and The ﬁnal evaluation is done on 100 other instances (still randomly generated in the same manner) using Intel Xeon E5-2650 CPU with 32GB of RAM and a time limit of 60 minutes.
Software Dependencies	No	Algorithms used for training have been implemented in Python and Pytorch (Paszke et al. 2019) is used for designing the neural networks. Library DGL (Wang et al. 2019) is used for implementing graph embedding, and Set Transformer (Lee et al. 2018) for set embedding. The CP solver used is Gecode (Schulte, Lagerkvist, and Tack 2006), which has the beneﬁt to be opensource and to offer a lot of freedom for designing new search procedures. As Gecode is implemented in C++, an operability interface with Python code is required. It is done using Pybind11 (Jakob, Rhinelander, and Moldovan 2017). The paper lists software but not their specific version numbers.
Experiment Setup	No	Training time is limited to 48 hours, memory consumption to 32 GB and 1 GPU (Tesla V100-SXM2-32GB) is used per model. Models are trained with a single run. A new model is recorded after each 100 episodes of the RL algorithm and the model achieving the best average reward on a validation set of 100 instances generated in the same way as for the training is selected. The ﬁnal evaluation is done on 100 other instances (still randomly generated in the same manner) using Intel Xeon E5-2650 CPU with 32GB of RAM and a time limit of 60 minutes. Detailed information about the hyper-parameters tested and selected are proposed in the supplementary material. The specific hyper-parameters are deferred to supplementary material.