reproducibilityindex.ai

Learning Generalized Linear Programming Value Functions

Authors: Tu Anh-Nguyen, Joey Huchette, Christian Tjandraatmadja

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We numerically show that our method can approximate the GVF well, even when compared to supervised methods that collect training data by solving an LP for each data point. Furthermore, as an application of our framework, we develop a fast heuristic method for large-scale two-stage MILPs with continuous second-stage variables, via a compact reformulation that can be solved faster than the full model linear relaxation at large scales and orders of magnitude faster than the original model. In this section, we computationally evaluate both the approximation quality of the learning method described in Section 5 and the effectiveness of the heuristic for two-stage problems from Section 6.
Researcher Affiliation	Collaboration	Tu Anh-Nguyen Google Research and Rice University Houston, TX tu.na@rice.edu Joey Huchette Google Research Cambridge, MA jhuchette@google.com Christian Tjandraatmadja Google Research Cambridge, MA ctjandra@google.com
Pseudocode	Yes	Algorithm 1 Learning GVF with objective upper bounds; Algorithm 2 Post-processing to guarantee the lower-bounding property; Algorithm 3 GVF-based heuristic for two-stage MILPs with continuous second-stage variables
Open Source Code	Yes	All code for the experiments can be found at https://github.com/google-research/ google-research/tree/master/learning_gvf.
Open Datasets	Yes	We evaluate these methods on the uncapacitated facility location (UFL) [54]. This is a deterministic two-stage problem, in which we ﬁrst select nf facilities to open, and allocate each of nc customers to an open facilities. We consider two classes of instances, Euclidean and KG, both with nc = nf. In both cases, we take a set of objective and right-hand side vectors from one instance for training and a second, different, set from ﬁve instances for testing (more details are provided in Appendix C). KG, which are the symmetric Koerkel-Ghosh instances from the UFLLIB benchmark set [25]. To generate SCFL instances, we take deterministic instances from the OR-Library [6].
Dataset Splits	No	For DSM, we select the model within the T iterations that satisﬁes at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E.
Hardware Specification	No	We train both of these models on a single GPU.
Software Dependencies	No	For DSM, we select the model within the T iterations that satisﬁes at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E. In addition, to avoid vanishing gradients, we use a composition of Ge LU and Re LU during training only. The final layer of γ-stack is then reshaped to 4 32 or 4 64 to match the shape for the output of β-stack. The dropout is only added in the last layer of the DSM and its parameter is chosen among {0.02, 0.03, 0.04}. With Dense Net, we also train the model with 1, 2, 3 layers, each with 32, 64 neurons. Dropout is added in every layer of the Dense Net and its parameter is chosen among {0.1, 0.2, 0.3}. The learning rate of Adam for both model is selected from {10 2, 10 3, 10 4}.
Experiment Setup	Yes	To learn each GVF, we run a total of T = 40 iterations in Algorithm 1, at each iteration, we solve (8) by performing 100 steps of the Adam algorithm [31]. For DSM, we select the model within the T iterations that satisﬁes at least 98% of the constraints (7b) from the training dataset with lowest training objective function (8). Details of hyperparameter tuning for DSMs and Dense Nets are provided in Appendix E. With DSM, for the β-stack we perform training with 1, 2, 3 layers, each with 32, 64 neurons. The activation of every layer is max-pooling with a window of size 5. For the γ-stack, we also train with 2, 3 layers each with 32, 64 neurons, except the size of last layer is 128, 256 respectively. The dropout is only added in the last layer of the DSM and its parameter is chosen among {0.02, 0.03, 0.04}. With Dense Net, we also train the model with 1, 2, 3 layers, each with 32, 64 neurons. Dropout is added in every layer of the Dense Net and its parameter is chosen among {0.1, 0.2, 0.3}. The learning rate of Adam for both model is selected from {10 2, 10 3, 10 4}. For the training dataset, we choose an upper bound of the GVF to be 2, which is an upper bound for the largest possible assignment cost in the training dataset after normalization.