reproducibilityindex.ai

Learning to Branch with Tree MDPs

Authors: Lara Scavuzzo, Feng Chen, Didier Chetelat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
Researcher Affiliation	Academia	Lara Scavuzzo Delft University of Technology l.v.scavuzzomontana@tudelft.nl Feng Yang Chen Polytechnique Montréal feng-yang.chen@polymtl.ca Didier Chételat Polytechnique Montréal didier.chetelat@polymtl.ca Maxime Gasse Mila, Polytechnique Montréal maxime.gasse@polymtl.ca Andrea Lodi Jacobs Technion-Cornell Institute Cornell Tech and Technion IIT andrea.lodi@cornell.edu Neil Yorke-Smith Delft University of Technology n.yorke-smith@tudelft.nl Karen Aardal Delft University of Technology k.i.aardal@tudelft.nl
Pseudocode	Yes	Algorithm 1 REINFORCE training loop
Open Source Code	Yes	Code for reproducing all experiments is available online 7. 7 https://github.com/lascavana/rl2branch
Open Datasets	No	For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the ﬁnal evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. More information about benchmarks and instance sizes can be found in the Supplementary Material (A.5). The paper mentions generating instances but does not provide concrete access information (link, DOI, specific citation for the generated data itself).
Dataset Splits	Yes	For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the ﬁnal evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones.
Hardware Specification	No	All experiments are run on compute nodes equipped with a GPU. This statement is too general, as it does not specify the model or type of GPU, or any other hardware component.
Software Dependencies	No	Our implementation uses Py Torch [31] together with Py Torch Geometric [12], and Ecole [32] for interfacing to the solver SCIP [15]. The paper lists software used but does not provide specific version numbers for any of them.
Experiment Setup	No	We use a plain REINFORCE [35] with entropy bonus as our RL algorithm, for simplicity. Our training procedure is summarized in Algorithm 1. We set a maximum of 15,000 epochs and a time limit of six days for training. While Algorithm 1 lists hyperparameters (entropy bonus λ, learning rate α, sample rate β), their specific values are not provided in the text.