Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning to Branch with Tree MDPs
Authors: Lara Scavuzzo, Feng Chen, Didier Chetelat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, Karen Aardal
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs. |
| Researcher Affiliation | Academia | Lara Scavuzzo Delft University of Technology EMAIL Feng Yang Chen Polytechnique Montréal EMAIL Didier Chételat Polytechnique Montréal EMAIL Maxime Gasse Mila, Polytechnique Montréal EMAIL Andrea Lodi Jacobs Technion-Cornell Institute Cornell Tech and Technion IIT EMAIL Neil Yorke-Smith Delft University of Technology EMAIL Karen Aardal Delft University of Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 REINFORCE training loop |
| Open Source Code | Yes | Code for reproducing all experiments is available online 7. 7 https://github.com/lascavana/rl2branch |
| Open Datasets | No | For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. More information about benchmarks and instance sizes can be found in the Supplementary Material (A.5). The paper mentions generating instances but does not provide concrete access information (link, DOI, specific citation for the generated data itself). |
| Dataset Splits | Yes | For each benchmark we generate a training set of 10,000 instances, along with a small set of 20 validation instances for tracking the RL performance during training. For the final evaluation, we further generate a test set of 40 instances, the same size as the training ones, and also a transfer set of 40 instances, larger and more challenging than the training ones. |
| Hardware Specification | No | All experiments are run on compute nodes equipped with a GPU. This statement is too general, as it does not specify the model or type of GPU, or any other hardware component. |
| Software Dependencies | No | Our implementation uses Py Torch [31] together with Py Torch Geometric [12], and Ecole [32] for interfacing to the solver SCIP [15]. The paper lists software used but does not provide specific version numbers for any of them. |
| Experiment Setup | No | We use a plain REINFORCE [35] with entropy bonus as our RL algorithm, for simplicity. Our training procedure is summarized in Algorithm 1. We set a maximum of 15,000 epochs and a time limit of six days for training. While Algorithm 1 lists hyperparameters (entropy bonus λ, learning rate α, sample rate β), their specific values are not provided in the text. |