Learning to Branch with Tree MDPs
Authors: Lara Scavuzzo, Feng Chen, Didier Chetelat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs. |
| Researcher Affiliation | Academia | Lara Scavuzzo Delft University of Technology l.v.scavuzzomontana@tudelft.nl Feng Yang Chen Polytechnique Montréal feng-yang.chen@polymtl.ca Didier Chételat Polytechnique Montréal didier.chetelat@polymtl.ca Maxime Gasse Mila, Polytechnique Montréal maxime.gasse@polymtl.ca Andrea Lodi Jacobs Technion-Cornell Institute Cornell Tech and Technion IIT andrea.lodi@cornell.edu Neil Yorke-Smith Delft University of Technology n.yorke-smith@tudelft.nl Karen Aardal Delft University of Technology k.i.aardal@tudelft.nl |
| Pseudocode | Yes | Algorithm 1 REINFORCE training loop |
| Open Source Code | Yes | Code for reproducing all experiments is available online 7. 7 https://github.com/lascavana/rl2branch |
| Open Datasets | No | For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. More information about benchmarks and instance sizes can be found in the Supplementary Material (A.5). The paper mentions generating instances but does not provide concrete access information (link, DOI, specific citation for the generated data itself). |
| Dataset Splits | Yes | For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. |
| Hardware Specification | No | All experiments are run on compute nodes equipped with a GPU. This statement is too general, as it does not specify the model or type of GPU, or any other hardware component. |
| Software Dependencies | No | Our implementation uses Py Torch [31] together with Py Torch Geometric [12], and Ecole [32] for interfacing to the solver SCIP [15]. The paper lists software used but does not provide specific version numbers for any of them. |
| Experiment Setup | No | We use a plain REINFORCE [35] with entropy bonus as our RL algorithm, for simplicity. Our training procedure is summarized in Algorithm 1. We set a maximum of 15,000 epochs and a time limit of six days for training. While Algorithm 1 lists hyperparameters (entropy bonus λ, learning rate α, sample rate β), their specific values are not provided in the text. |