reproducibilityindex.ai

Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation

Authors: Jieyi Bi, Yining Ma, Jiahai Wang, Zhiguang Cao, Jinbiao Chen, Yuan Sun, Yeow Meng Chee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results show that, compared with the baseline neural methods, our AMDKD is able to achieve competitive results on both unseen in-distribution and out-of-distribution instances, which are either randomly synthesized or adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB).
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Sun Yat-sen University 2National University of Singapore 3Singapore Institute of Manufacturing Technology, A*STAR 4University of Melbourne
Pseudocode	Yes	Algorithm 1 Adaptive Multi-Distribution Knowledge Distillation (AMDKD)
Open Source Code	Yes	Our implementation in Py Torch are publicly available2. 2https://github.com/jieyibi/AMDKD
Open Datasets	Yes	We conduct experiments on TSP and CVRP with n = 20, 50, and 100 nodes similar to [12, 14]. As aforementioned, we adopt Uniform, Cluster and Mixed (mixture of uniform and cluster) as exemplar distributions for training; Expansion, Implosion, Explosion, and Grid as the unseen distributions for testing. For the above 7 distributions, we follow [10, 21, 43] to generate the respective instances (details are presented in Appendix A). ... adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB).
Dataset Splits	Yes	The likelihood padaptive of selecting distribution d {U, C, M} is proportional to the exponent value of the gaps to the LKH solver [4]... according to the current performance of the student on a validation dataset. ... on the given validation datasets ZU, ZC, ZM (each with 1,000 instances) for each exemplar distribution.
Hardware Specification	Yes	All experiments are conducted on a machine with NVIDIA RTX 3090 GPU cards and Intel Xeon Silver 4216 CPU at 2.10GHz.
Software Dependencies	No	The paper mentions 'Our implementation in Py Torch are publicly available' and 'The Adam optimizer is used', but does not specify version numbers for PyTorch, Adam, or any other critical software dependencies.
Experiment Setup	Yes	For the student distillation phase, we use batch size B = 5123, and task-specific hyper-parameters T = 250, E = 500 for AMDKD-AM and T = 20, E = 1 for AMDKD-POMO, respectively. By default, the dimension of the node embeddings in our student networks AMDKD-AM and AMDKD-POMO is reduced from 128 (teacher) to 64 (student)... The Adam optimizer is used with learning rate 1e-4.