MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Authors: Jianan Zhou, Zhiguang Cao, Yaoxin Wu, Wen Song, Yining Ma, Jie Zhang, Xu Chi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, our method significantly promotes zero-shot generalization performance on 10 unseen VRP variants, and showcases decent results on the fewshot setting and real-world benchmark instances. We further conduct extensive studies on the effect of Mo E configurations in solving VRPs, and observe the superiority of hierarchical gating when facing out-of-distribution data. |
| Researcher Affiliation | Collaboration | 1College of Computing and Data Science, Nanyang Technological University, Singapore 2School of Computing and Information Systems, Singapore Management University, Singapore 3Department of Information Systems, Eindhoven University of Technology, The Netherlands 4Institute of Marine Science and Technology, Shandong University, China 5Singapore Institute of Manufacturing Technology (SIMTech), Agency for Science, Technology and Research (A*STAR), Singapore. |
| Pseudocode | No | The paper describes its methods but does not contain any structured pseudocode or algorithm blocks (e.g., 'Algorithm 1'). |
| Open Source Code | Yes | The source code is available at: https://github.com/RoyalSkye/Routing-MVMo E. |
| Open Datasets | Yes | We evaluate all neural solvers on CVRPLIB benchmark dataset, including CVRP and VRPTW instances with various problem sizes and attribute distributions. We mainly consider the classic Set-X (Uchoa et al., 2017) and Set-Solomon (Solomon, 1987). ... We present more details of VRP variants and the associated data generation process in Appendix A. |
| Dataset Splits | No | The paper mentions training on '100M training instances' and evaluating on a 'test dataset that contains 1K instances', and shows 'validation curves' in Fig. 7. However, it does not provide specific split percentages or counts for a distinct validation set from the overall dataset for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on a machine with NVIDIA Ampere A100-80GB GPU cards and AMD EPYC 7513 CPU at 2.6GHz. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and implies a Python-based implementation using 'PyTorch' concepts (like 'TensorFlow' or 'PyTorch' often imply Transformer architecture elements), but it does not specify version numbers for these software components or other libraries. It also mentions 'HGS (Vidal, 2022)', 'LKH3 (Helsgaun, 2017)', and 'OR-Tools (Furnon & Perron, 2023)' but without specific version numbers for these solvers. |
| Experiment Setup | Yes | Adam optimizer is used with the learning rate of 1e 4, the weight decay of 1e 6, and the batch size of 128. The model is trained for 5000 epochs, with each containing 20000 training instances (i.e., 100M training instances in total). The learning rate is decayed by 10 for the last 10% training instances. We consider two problem scales n {50, 100} during training... We employ m = 4 experts with K = β = 2 in each Mo E layer, and set the the weight α of the auxiliary loss Lb as 0.01. |