NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

Authors: Hongbo Zhang, Guang Wang, Xu Wang, Zhengyang Zhou, Chen Zhang, Zheng Dong, Yang Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.
Researcher Affiliation Academia 1University of Science and Technology of China 2Florida State University 3Wayne State University
Pseudocode Yes Algorithm 1: Nond BCQ
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the methodology described.
Open Datasets No We evaluate our algorithm using real-world ride-hailing data from a large city, over a period of eight weeks. Two types of data are utilized, including vehicle GPS data and more than 20 million order data from over 50K vehicles. The dataset spans from 09/2021 to 11/2021. The paper mentions 'Data for Social Good initiatives also make them available for research' but does not provide specific access information (link, DOI, formal citation) for the dataset used in this study.
Dataset Splits No We use the data from the first 6 weeks for training the model, and the data from the last 2 weeks are loaded into the simulator for performance evaluation. The paper specifies training and test data but does not mention a distinct validation set or its split.
Hardware Specification Yes Our experiment is implemented in Python with Tensor Flow 1.15, and executed under the environment with a CPU as Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz and one GPU as Nvidia Tesla V100 16GB.
Software Dependencies Yes Our experiment is implemented in Python with Tensor Flow 1.15
Experiment Setup Yes The tuned hyperparameters are set as follows. γ = 0.95, τ = 1, λ = 0.75, β = min(max(n /n, 0.9, 1))