MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Xu Chi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, our method significantly promotes zero-shot generalization performance on 10 unseen VRP variants, and showcases decent results on the fewshot setting and real-world benchmark instances. We further conduct extensive studies on the effect of Mo E configurations in solving VRPs, and observe the superiority of hierarchical gating when facing out-of-distribution data.
Researcher Affiliation Collaboration 1College of Computing and Data Science, Nanyang Technological University, Singapore 2School of Computing and Information Systems, Singapore Management University, Singapore 3Department of Information Systems, Eindhoven University of Technology, The Netherlands 4Institute of Marine Science and Technology, Shandong University, China 5Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Singapore.
Pseudocode No The paper describes its methods but does not contain any structured pseudocode or algorithm blocks (e.g., 'Algorithm 1').
Open Source Code Yes The source code is available at: https://github.com/RoyalSkye/Routing-MVMo E.
Open Datasets Yes We evaluate all neural solvers on CVRPLIB benchmark dataset, including CVRP and VRPTW instances with various problem sizes and attribute distributions. We mainly consider the classic Set-X (Uchoa et al., 2017) and Set-Solomon (Solomon, 1987). ... We present more details of VRP variants and the associated data generation process in Appendix A.
Dataset Splits No The paper mentions training on '100M training instances' and evaluating on a 'test dataset that contains 1K instances', and shows 'validation curves' in Fig. 7. However, it does not provide specific split percentages or counts for a distinct validation set from the overall dataset for reproducibility.
Hardware Specification Yes All experiments are conducted on a machine with NVIDIA Ampere A100-80GB GPU cards and AMD EPYC 7513 CPU at 2.6GHz.
Software Dependencies No The paper mentions using 'Adam optimizer' and implies a Python-based implementation using 'PyTorch' concepts (like 'TensorFlow' or 'PyTorch' often imply Transformer architecture elements), but it does not specify version numbers for these software components or other libraries. It also mentions 'HGS (Vidal, 2022)', 'LKH3 (Helsgaun, 2017)', and 'OR-Tools (Furnon & Perron, 2023)' but without specific version numbers for these solvers.
Experiment Setup Yes Adam optimizer is used with the learning rate of 1e 4, the weight decay of 1e 6, and the batch size of 128. The model is trained for 5000 epochs, with each containing 20000 training instances (i.e., 100M training instances in total). The learning rate is decayed by 10 for the last 10% training instances. We consider two problem scales n {50, 100} during training... We employ m = 4 experts with K = β = 2 in each Mo E layer, and set the the weight α of the auxiliary loss Lb as 0.01.