Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization
Authors: Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically compare our proposed LEHD model with other learning-based and classical solvers on TSP and CVRP instances with various sizes and distributions. |
| Researcher Affiliation | Academia | Fu Luo1 Xi Lin2 Fei Liu2 Qingfu Zhang2 Zhenkun Wang1 1 Southern University of Science and Technology 2 City University of Hong Kong |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD. |
| Open Datasets | Yes | The training and test datasets can be downloaded from Google Drive 3 or Baidu Cloud 4. ... Table 13: List of licenses for the codes and datasets we used in this work ... TSPLib Dataset http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ ... CVRPLib Dataset http://vrp.galgos.inf.puc-rio.br/index.php/en/ |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly provide details about a validation set or its split from the main dataset. |
| Hardware Specification | Yes | In all experiments, we train and test our LEHD models using a single NVIDIA Ge Force RTX 3090 GPU with 24GB memory. |
| Software Dependencies | No | The paper mentions 'Adam [26]' as the optimizer but does not specify version numbers for programming languages, machine learning frameworks, or other key software libraries used for the implementation. |
| Experiment Setup | Yes | For our LEHD model, the embedding dimension is set to 128, and the number of attention layers in the decoder is set to 6. In each attention layer, the head number of MHA is set to 8, and the dimension of the feed-forward layer is set to 512. ... The optimizer is Adam [26] with an initial learning rate of 1e-4. The value of the learning rate decay is set to 0.97 per epoch for the TSP model and 0.9 per epoch for the CVRP model. With a batch size of 1024, we train the TSP model for 150 epochs and the CVRP model for 40 epochs since it converges much faster. |