Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems
Authors: Yi Zhang, Yushen Long, Yun Ni, Liping Huang, Xiaohong Wang, Jun Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments based on scenarios derived from both the New York and Chicago taxi datasets demonstrate the effectiveness of our approach, achieving an average improvement of 16% compared to state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | 1 Agency for Science, Technology and Research, Singapore 2 Morgan Stanley Asia Pte. 3 Onto Innovation Inc. 4 School of computing and communications, Lancaster University, UK EMAIL EMAIL, EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Generate New Individual Algorithm 2 LLM-Optimizer Interaction Protocol |
| Open Source Code | Yes | The source code can be found in: https://github.com/ yizhangele/llm-guided-mod-optimization. |
| Open Datasets | Yes | The 9 testing scenarios in Table 2 are constructed using the New York taxi dataset [39]. We also test on Chicago taxi dataset [40]... [39] New York City Taxi and Limousine Commission. Tlc trip record data. https://www.nyc. gov/site/tlc/about/tlc-trip-record-data.page. [40] Chicago Data Portal. Taxi trips. https://data.cityofchicago.org/Transportation/ Taxi-Trips-2013-2023-/wrvz-psew/about_data. |
| Dataset Splits | No | While our work does not involve training machine learning models in the traditional sense, it integrates a well-established pretrained LLM with a mathematical optimization framework. As such, there are no training/test data splits or model training procedures to report. |
| Hardware Specification | Yes | All optimizer-based methods, either manual objectives or our adaptive-objective method, optimization solver Gurobi [42] is adopted to solve the problem running on a PC with 13th Gen Intel Core i9-13900KF 32 CPU up to 5.80 GHz and RAM 32GB. |
| Software Dependencies | Yes | In our experimental setup, we utilize the Deep Seek-R1-Distill-Qwen-32B [41] model through the Hugging Face platform API as the default large language model for all LLMbased methods... optimization solver Gurobi [42] is adopted to solve the problem running on a PC with 13th Gen Intel Core i9-13900KF 32 CPU up to 5.80 GHz and RAM 32GB. |
| Experiment Setup | Yes | In our experimental setup, we utilize the Deep Seek-R1-Distill-Qwen-32B [41] model through the Hugging Face platform API as the default large language model for all LLMbased methods, which allow us to evaluate the adaptability of our method on smaller LLMs, thereby highlighting its potential applications. The temperature parameter is configured to 0.9. LLM-based methods all executed 3 times for each scenario, and the mean value of these three runs is reported in Tables 2 and 3. Fun Search is performed under 20 iterations. Eo H and our method all employ 10 iterations with a population size of 5. |