Learning to Branch with Tree MDPs

Authors: Lara Scavuzzo, Feng Chen, Didier Chetelat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
Researcher Affiliation Academia Lara Scavuzzo Delft University of Technology l.v.scavuzzomontana@tudelft.nl Feng Yang Chen Polytechnique Montréal feng-yang.chen@polymtl.ca Didier Chételat Polytechnique Montréal didier.chetelat@polymtl.ca Maxime Gasse Mila, Polytechnique Montréal maxime.gasse@polymtl.ca Andrea Lodi Jacobs Technion-Cornell Institute Cornell Tech and Technion IIT andrea.lodi@cornell.edu Neil Yorke-Smith Delft University of Technology n.yorke-smith@tudelft.nl Karen Aardal Delft University of Technology k.i.aardal@tudelft.nl
Pseudocode Yes Algorithm 1 REINFORCE training loop
Open Source Code Yes Code for reproducing all experiments is available online 7. 7 https://github.com/lascavana/rl2branch
Open Datasets No For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. More information about benchmarks and instance sizes can be found in the Supplementary Material (A.5). The paper mentions generating instances but does not provide concrete access information (link, DOI, specific citation for the generated data itself).
Dataset Splits Yes For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones.
Hardware Specification No All experiments are run on compute nodes equipped with a GPU. This statement is too general, as it does not specify the model or type of GPU, or any other hardware component.
Software Dependencies No Our implementation uses Py Torch [31] together with Py Torch Geometric [12], and Ecole [32] for interfacing to the solver SCIP [15]. The paper lists software used but does not provide specific version numbers for any of them.
Experiment Setup No We use a plain REINFORCE [35] with entropy bonus as our RL algorithm, for simplicity. Our training procedure is summarized in Algorithm 1. We set a maximum of 15,000 epochs and a time limit of six days for training. While Algorithm 1 lists hyperparameters (entropy bonus λ, learning rate α, sample rate β), their specific values are not provided in the text.