LinSATNet: The Positive Linear Satisfiability Neural Networks
Authors: Runzhong Wang, Yunhao Zhang, Ziao Guo, Tianyi Chen, Xiaokang Yang, Junchi Yan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate its wide applicability, we deploy our Lin SAT to three scenarios regarding constrained routing, outlier-aware matching, and predictive portfolio allocation. In these cases, an explicit objective function is difficult to define and a satisfiable solution is the purpose. For TSP-PRI, the number of priority steps is set to m = 5. The following baselines are considered with results shown in Table 2: 1) Mixed integer programming (MIP) solvers by directly applying the commercial solver Gurobi (Gurobi Optimization, 2020) to formulations in Sec. 4.2 and the time limit per instance is set as 2s/10s; 2) Heuristics e.g. nearest neighbor and insertion heuristics (Johnson, 1990) that are usually fast and approximate algorithms; 3) State-of-the-art solvers for standard TSP like Concorde2 and LKH3 (Helsgaun, 2017), and Gurobi (MTZ) means applying Gurobi to the TSP formulation named after Miller-Tucker-Zemlin (MTZ) (Miller et al., 1960); 4) RL-based neural routing solver Attention Model (Kool et al., 2019). |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, and Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University 2Shanghai AI Laboratory. |
| Pseudocode | Yes | Algorithm 1 Sinkhorn for Single-Set Marginals (Classic) Algorithm 2 Sinkhorn for Multi-Set Marginals (Proposed) |
| Open Source Code | Yes | Source code is available at https://github. com/Thinklab-SJTU/Lin SATNet. |
| Open Datasets | Yes | We do experiments on Pascal VOC Keypoint dataset (Everingham et al., 2010) with Berkeley annotations (Bourdev & Malik, 2009) under the unfiltered setting following Rol ınek et al. (2020) and report the matching F1 scores between graph pairs. Following Kool et al. (2019), for both TSP variants, we generate 10,000 2-D Euclidean TSP instances as the testing set. Each instance consists of n = 20 nodes uniformly sampled in the unit square [0, 1]2. The starting, ending, and priority cities are randomly selected. As our model is unsupervised, the training set is generated on the fly using the same process. The training set is built on the real prices of 494 assets from the S&P 500 index from 2018-01-01 to 2020-12-30. |
| Dataset Splits | No | Following Kool et al. (2019), for both TSP variants, we generate 10,000 2-D Euclidean TSP instances as the testing set. Each instance consists of n = 20 nodes uniformly sampled in the unit square [0, 1]2. The starting, ending, and priority cities are randomly selected. As our model is unsupervised, the training set is generated on the fly using the same process. |
| Hardware Specification | Yes | Our model runs on a single NVIDIA GeForce RTX 2080Ti GPU with 11GB memory. All our experiments are run on our workstation with Intel(R) Core(TM) i7-7820X CPU, NVIDIA GeForce RTX 2080Ti GPU, and 11GB memory. |
| Software Dependencies | No | Mixed integer programming (MIP) solvers by directly applying the commercial solver Gurobi (Gurobi Optimization, 2020). The automatic differentiation feature of modern deep learning frameworks (Paszke et al., 2017) is mentioned as making backpropagation easy to implement. |
| Experiment Setup | Yes | In both TSP-SE and TSP-PRI, we use a 3-layer Transformer to encode the 2-D coordinates into hidden vectors. Then the hidden vectors are projected into the pre-projected matrix using a 3-layer MLP with ReLU activation. Dimensions of hidden states for both the Transformer and the MLP are set to 256, and the head number of multi-head attention in the Transformer is set to 8. We train the model for 100 epochs for both TSP-SE and TSP-PRI. In each epoch, we randomly generate 256,000 instances as the training set of this epoch. The batch size is set to 1,024. Adam optimizer is used for training and the learning rate is set to 1e-4. During inference, we use beam search to post-process the continuous matrix e X output by the network to get the binary matrix X. The width of the beam for beam search is set to 2,048. |