Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

Authors: Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically compare our proposed LEHD model with other learning-based and classical solvers on TSP and CVRP instances with various sizes and distributions.
Researcher Affiliation Academia Fu Luo1 Xi Lin2 Fei Liu2 Qingfu Zhang2 Zhenkun Wang1 1 Southern University of Science and Technology 2 City University of Hong Kong
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.
Open Datasets Yes The training and test datasets can be downloaded from Google Drive 3 or Baidu Cloud 4. ... Table 13: List of licenses for the codes and datasets we used in this work ... TSPLib Dataset http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ ... CVRPLib Dataset http://vrp.galgos.inf.puc-rio.br/index.php/en/
Dataset Splits No The paper specifies training and test sets but does not explicitly provide details about a validation set or its split from the main dataset.
Hardware Specification Yes In all experiments, we train and test our LEHD models using a single NVIDIA Ge Force RTX 3090 GPU with 24GB memory.
Software Dependencies No The paper mentions 'Adam [26]' as the optimizer but does not specify version numbers for programming languages, machine learning frameworks, or other key software libraries used for the implementation.
Experiment Setup Yes For our LEHD model, the embedding dimension is set to 128, and the number of attention layers in the decoder is set to 6. In each attention layer, the head number of MHA is set to 8, and the dimension of the feed-forward layer is set to 512. ... The optimizer is Adam [26] with an initial learning rate of 1e-4. The value of the learning rate decay is set to 0.97 per epoch for the TSP model and 0.9 per epoch for the CVRP model. With a batch size of 1024, we train the TSP model for 150 epochs and the CVRP model for 40 epochs since it converges much faster.