Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation
Authors: Jieyi Bi, Yining Ma, Jiahai Wang, Zhiguang Cao, Jinbiao Chen, Yuan Sun, Yeow Meng Chee
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that, compared with the baseline neural methods, our AMDKD is able to achieve competitive results on both unseen in-distribution and out-of-distribution instances, which are either randomly synthesized or adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB). |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Engineering, Sun Yat-sen University 2National University of Singapore 3Singapore Institute of Manufacturing Technology, A*STAR 4University of Melbourne |
| Pseudocode | Yes | Algorithm 1 Adaptive Multi-Distribution Knowledge Distillation (AMDKD) |
| Open Source Code | Yes | Our implementation in Py Torch are publicly available2. 2https://github.com/jieyibi/AMDKD |
| Open Datasets | Yes | We conduct experiments on TSP and CVRP with n = 20, 50, and 100 nodes similar to [12, 14]. As aforementioned, we adopt Uniform, Cluster and Mixed (mixture of uniform and cluster) as exemplar distributions for training; Expansion, Implosion, Explosion, and Grid as the unseen distributions for testing. For the above 7 distributions, we follow [10, 21, 43] to generate the respective instances (details are presented in Appendix A). ... adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB). |
| Dataset Splits | Yes | The likelihood padaptive of selecting distribution d {U, C, M} is proportional to the exponent value of the gaps to the LKH solver [4]... according to the current performance of the student on a validation dataset. ... on the given validation datasets ZU, ZC, ZM (each with 1,000 instances) for each exemplar distribution. |
| Hardware Specification | Yes | All experiments are conducted on a machine with NVIDIA RTX 3090 GPU cards and Intel Xeon Silver 4216 CPU at 2.10GHz. |
| Software Dependencies | No | The paper mentions 'Our implementation in Py Torch are publicly available' and 'The Adam optimizer is used', but does not specify version numbers for PyTorch, Adam, or any other critical software dependencies. |
| Experiment Setup | Yes | For the student distillation phase, we use batch size B = 5123, and task-specific hyper-parameters T = 250, E = 500 for AMDKD-AM and T = 20, E = 1 for AMDKD-POMO, respectively. By default, the dimension of the node embeddings in our student networks AMDKD-AM and AMDKD-POMO is reduced from 128 (teacher) to 64 (student)... The Adam optimizer is used with learning rate 1e-4. |