Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems

Authors: Yi Zhang, Yushen Long, Yun Ni, Liping Huang, Xiaohong Wang, Jun Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments based on scenarios derived from both the New York and Chicago taxi datasets demonstrate the effectiveness of our approach, achieving an average improvement of 16% compared to state-of-the-art baselines.
Researcher Affiliation Collaboration 1 Agency for Science, Technology and Research, Singapore 2 Morgan Stanley Asia Pte. 3 Onto Innovation Inc. 4 School of computing and communications, Lancaster University, UK EMAIL EMAIL, EMAIL EMAIL
Pseudocode Yes Algorithm 1 Generate New Individual Algorithm 2 LLM-Optimizer Interaction Protocol
Open Source Code Yes The source code can be found in: https://github.com/ yizhangele/llm-guided-mod-optimization.
Open Datasets Yes The 9 testing scenarios in Table 2 are constructed using the New York taxi dataset [39]. We also test on Chicago taxi dataset [40]... [39] New York City Taxi and Limousine Commission. Tlc trip record data. https://www.nyc. gov/site/tlc/about/tlc-trip-record-data.page. [40] Chicago Data Portal. Taxi trips. https://data.cityofchicago.org/Transportation/ Taxi-Trips-2013-2023-/wrvz-psew/about_data.
Dataset Splits No While our work does not involve training machine learning models in the traditional sense, it integrates a well-established pretrained LLM with a mathematical optimization framework. As such, there are no training/test data splits or model training procedures to report.
Hardware Specification Yes All optimizer-based methods, either manual objectives or our adaptive-objective method, optimization solver Gurobi [42] is adopted to solve the problem running on a PC with 13th Gen Intel Core i9-13900KF 32 CPU up to 5.80 GHz and RAM 32GB.
Software Dependencies Yes In our experimental setup, we utilize the Deep Seek-R1-Distill-Qwen-32B [41] model through the Hugging Face platform API as the default large language model for all LLMbased methods... optimization solver Gurobi [42] is adopted to solve the problem running on a PC with 13th Gen Intel Core i9-13900KF 32 CPU up to 5.80 GHz and RAM 32GB.
Experiment Setup Yes In our experimental setup, we utilize the Deep Seek-R1-Distill-Qwen-32B [41] model through the Hugging Face platform API as the default large language model for all LLMbased methods, which allow us to evaluate the adaptability of our method on smaller LLMs, thereby highlighting its potential applications. The temperature parameter is configured to 0.9. LLM-based methods all executed 3 times for each scenario, and the mean value of these three runs is reported in Tables 2 and 3. Fun Search is performed under 20 iterations. Eo H and our method all employ 10 iterations with a population size of 5.