Learning Generalized Linear Programming Value Functions

Authors: Tu Anh-Nguyen, Joey Huchette, Christian Tjandraatmadja

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically show that our method can approximate the GVF well, even when compared to supervised methods that collect training data by solving an LP for each data point. Furthermore, as an application of our framework, we develop a fast heuristic method for large-scale two-stage MILPs with continuous second-stage variables, via a compact reformulation that can be solved faster than the full model linear relaxation at large scales and orders of magnitude faster than the original model. In this section, we computationally evaluate both the approximation quality of the learning method described in Section 5 and the effectiveness of the heuristic for two-stage problems from Section 6.
Researcher Affiliation Collaboration Tu Anh-Nguyen Google Research and Rice University Houston, TX tu.na@rice.edu Joey Huchette Google Research Cambridge, MA jhuchette@google.com Christian Tjandraatmadja Google Research Cambridge, MA ctjandra@google.com
Pseudocode Yes Algorithm 1 Learning GVF with objective upper bounds; Algorithm 2 Post-processing to guarantee the lower-bounding property; Algorithm 3 GVF-based heuristic for two-stage MILPs with continuous second-stage variables
Open Source Code Yes All code for the experiments can be found at https://github.com/google-research/ google-research/tree/master/learning_gvf.
Open Datasets Yes We evaluate these methods on the uncapacitated facility location (UFL) [54]. This is a deterministic two-stage problem, in which we first select nf facilities to open, and allocate each of nc customers to an open facilities. We consider two classes of instances, Euclidean and KG, both with nc = nf. In both cases, we take a set of objective and right-hand side vectors from one instance for training and a second, different, set from five instances for testing (more details are provided in Appendix C). KG, which are the symmetric Koerkel-Ghosh instances from the UFLLIB benchmark set [25]. To generate SCFL instances, we take deterministic instances from the OR-Library [6].
Dataset Splits No For DSM, we select the model within the T iterations that satisfies at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E.
Hardware Specification No We train both of these models on a single GPU.
Software Dependencies No For DSM, we select the model within the T iterations that satisfies at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E. In addition, to avoid vanishing gradients, we use a composition of Ge LU and Re LU during training only. The final layer of γ-stack is then reshaped to 4 32 or 4 64 to match the shape for the output of β-stack. The dropout is only added in the last layer of the DSM and its parameter is chosen among {0.02, 0.03, 0.04}. With Dense Net, we also train the model with 1, 2, 3 layers, each with 32, 64 neurons. Dropout is added in every layer of the Dense Net and its parameter is chosen among {0.1, 0.2, 0.3}. The learning rate of Adam for both model is selected from {10 2, 10 3, 10 4}.
Experiment Setup Yes To learn each GVF, we run a total of T = 40 iterations in Algorithm 1, at each iteration, we solve (8) by performing 100 steps of the Adam algorithm [31]. For DSM, we select the model within the T iterations that satisfies at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E. With DSM, for the β-stack we perform training with 1, 2, 3 layers, each with 32, 64 neurons. The activation of every layer is max-pooling with a window of size 5. For the γ-stack, we also train with 2, 3 layers each with 32, 64 neurons, except the size of last layer is 128, 256 respectively. The dropout is only added in the last layer of the DSM and its parameter is chosen among {0.02, 0.03, 0.04}. With Dense Net, we also train the model with 1, 2, 3 layers, each with 32, 64 neurons. Dropout is added in every layer of the Dense Net and its parameter is chosen among {0.1, 0.2, 0.3}. The learning rate of Adam for both model is selected from {10 2, 10 3, 10 4}. For the training dataset, we choose an upper bound of the GVF to be 2, which is an upper bound for the largest possible assignment cost in the training dataset after normalization.