Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Branch with Tree MDPs

Authors: Lara Scavuzzo, Feng Chen, Didier Chetelat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
Researcher Affiliation Academia Lara Scavuzzo Delft University of Technology EMAIL Feng Yang Chen Polytechnique Montréal EMAIL Didier Chételat Polytechnique Montréal EMAIL Maxime Gasse Mila, Polytechnique Montréal EMAIL Andrea Lodi Jacobs Technion-Cornell Institute Cornell Tech and Technion IIT EMAIL Neil Yorke-Smith Delft University of Technology EMAIL Karen Aardal Delft University of Technology EMAIL
Pseudocode Yes Algorithm 1 REINFORCE training loop
Open Source Code Yes Code for reproducing all experiments is available online 7. 7 https://github.com/lascavana/rl2branch
Open Datasets No For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. More information about benchmarks and instance sizes can be found in the Supplementary Material (A.5). The paper mentions generating instances but does not provide concrete access information (link, DOI, specific citation for the generated data itself).
Dataset Splits Yes For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones.
Hardware Specification No All experiments are run on compute nodes equipped with a GPU. This statement is too general, as it does not specify the model or type of GPU, or any other hardware component.
Software Dependencies No Our implementation uses Py Torch [31] together with Py Torch Geometric [12], and Ecole [32] for interfacing to the solver SCIP [15]. The paper lists software used but does not provide specific version numbers for any of them.
Experiment Setup No We use a plain REINFORCE [35] with entropy bonus as our RL algorithm, for simplicity. Our training procedure is summarized in Algorithm 1. We set a maximum of 15,000 epochs and a time limit of six days for training. While Algorithm 1 lists hyperparameters (entropy bonus λ, learning rate α, sample rate β), their specific values are not provided in the text.