Matrix encoding networks for neural combinatorial optimization
Authors: Yeong-Dae Kwon, Jinho Choo, Iljoo Yoon, Minah Park, Duwon Park, Youngjune Gwon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using an end-to-end model based on Mat Net, we solve asymmetric traveling salesman (ATSP) and flexible flow shop (FFSP) problems as the earliest neural approach. In particular, for a class of FFSP we have tested Mat Net on, we demonstrate a far superior empirical performance to any methods (neural or not) known to date. In Table 4.1, we compare the performance of our trained models with those of other representative baseline algorithms on 10,000 test instances of the tmat-class ATSP. |
| Researcher Affiliation | Industry | Yeong-Dae Kwon, Jinho Choo, Iljoo Yoon, Minah Park, Duwon Park, Youngjune Gwon Samsung SDS {y.d.kwon, jinho12.choo, iljoo.yoon, minah86.park, dw21.park, gyj.gwon}@samsung.com |
| Pseudocode | No | The paper provides architectural diagrams but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The training and testing code for the experiments described in the paper is publicly available.7 https://github.com/yd-kwon/Mat Net |
| Open Datasets | No | For ATSP, the paper mentions using '10,000 randomly generated problem instances' of 'tmat-class ATSP instances'. For FFSP, it uses instances where 'D(k) ij is filled with a random integer between 2 and 9'. While the problem types are standard, the specific instances are generated by the authors, and no direct public access link or citation to these specific datasets is provided. |
| Dataset Splits | No | The paper mentions '10,000 test instances' for ATSP and '1,000 test instances' for FFSP, but it does not provide explicit training/validation/test dataset splits or specify how a validation set was used for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | For N = 20 and 50, they take roughly 6 and 55 hours, respectively, on a single GPU (Nvidia V100). For N = 100, we accumulate gradients from 4 GPUs (each handling a batch of size 50) and the training takes about 110 hours. |
| Software Dependencies | Yes | We have implemented our Mat Net model in Py Torch. To solve test instances using MIP, we use one of the well-known polynomial MIP formulations of the ATSP [30] and let CPLEX [31] solve this model with the distance matrices that we provide. CPLEX [31] (V20.1) is one of the popular commercial optimization software used by the OR community. |
| Experiment Setup | Yes | The Mat Net model is constructed by stacking L = 5 encoding layers. The embedding dimension, dmodel, for input/output of each layer is set to 256. Both sub-modules (FA and FB) use h = 16 attention heads, where each head processes query, key, and value as 16-dimensional vectors (i.e., dq = dk = dv = 16). The score-mixing MLP in each head has one hidden layer with 16 nodes. ... Feed-forward blocks in Figure 2(a) is implemented with one hidden layer of dimension dff = 516. ... We use Adam optimizer [29] with a learning rate of 4 imes 10^-4 without a decay and a batch size of 200. |