DOGE-Train: Discrete Optimization on GPU with End-to-End Training

Authors: Ahmed Abbas, Paul Swoboda

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For training we use an unsupervised loss. We train on smaller problems and test on larger ones showing strong generalization performance with a GNN comprising only around 10k parameters.
Researcher Affiliation Academia Ahmed Abbas1, Paul Swoboda1,2,3 1Max Planck Institute for Informatics, Saarland Informatics Campus, 2University of Mannheim, 3Heinrich-Heine University Düsseldorf
Pseudocode Yes Algorithm 1: Generalized Min-Marginal Averaging; Algorithm 2: Block Update backpropagation; Algorithm 3: Parameter prediction by GNN
Open Source Code Yes 1Code available at https://github.com/LPMP/BDD
Open Datasets Yes Cell tracking (CT): Instances of developing flywing tissue from cell tracking challenge (Ulman et al. 2017) processed by (Haller et al. 2020) and obtained from (Swoboda et al. 2022). Graph matching (GM): Instances of graph matching for matching nuclei in 3D microscopic images (Long et al. 2009) processed by (Kainmueller et al. 2014) and made publicly available through (Swoboda et al. 2022) as ILPs. Independent set (IS): Random instances of independent set problem generated using (Prouvost et al. 2020). QAPLib: The benchmark dataset for quadratic assignment problems used in the combinatorial optimization community (Burkard, Karisch, and Rendl 1997).
Dataset Splits No The paper specifies train and test splits for each dataset (e.g., 'train on the 2 smaller instances and test on the largest one' for Cell Tracking), but it does not explicitly mention a separate validation split or how it was handled.
Hardware Specification Yes CPU solvers use AMD EPYC 7702 CPU with 16 threads. GPU solvers use one of RTX 8000 (48GB) or A100 (80GB) GPU depending on problem size.
Software Dependencies Yes For training we use Py Torch and implement Alg. 1,2 in CUDA. Gurobi: The dual simplex algorithm from the commercial solver (Gurobi Optimization, LLC 2021). Cplex, IBM ILOG 2019; MOSEK Ap S 2022
Experiment Setup Yes Due to varying instance sizes we use a separate set of hyperparameters for each dataset given in Table 5 of the Appendix2. Size of the learned embeddings h computed by the GNN in Alg. 3 is set to 16 for nodes and 8 for edges. For computing attention weights in (3) we use only one attention head for efficiency. The predictor Φ in Alg. 3 contains 4 linear layers with the Re LU activation. We train the networks using the Adam optimizer (Kingma and Ba 2014). To prevent gradient overflow we use gradient clipping on model parameters by an l2 norm of 50.