Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method

Authors: Zhiguang Cao, Hongliang Guo, Jie Zhang, Frans Oliehoek, Ulrich Fastenrath

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real road networks demonstrate the significant advantages of our method over other counterparts.
Researcher Affiliation Collaboration 1Energy Research Institute @NTU, Nanyang Technological University, Singapore 2School of Automation Engineering, University of Electronic Science and Technology of China, China 3School of Computer Science and Engineering, Nanyang Technological University, Singapore 4Department of Computer Science, University of Liverpool, UK 5Informatics Institute, University of Amsterdam, Netherlands 6Department of Traffic Information Management and Routing Optimization, BMW Group, Germany
Pseudocode Yes Algorithm 1: Value Function Training
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes On Beijing network, we directly use the travel time data collected from real travel trajectories of taxi from September to October 2013 (Wang, Zheng, and Xue 2014).
Dataset Splits No The paper describes evaluating its method on '200 o-d pairs' and discusses 'accuracy', but it does not explicitly define or mention standard training, validation, or test dataset splits for the Q-learning model or neural network training process. It mentions 'dynamically train the neural network' and 'new label to train fk+1( z)', but not distinct validation sets.
Hardware Specification Yes All experiments are performed on a typical PC with Intel Core i7-3540M processor and 16GB RAM.
Software Dependencies No The paper mentions using a 'dynamic neural network' but does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, scikit-learn).
Experiment Setup Yes Specifically, Lines 1-2 initialize parameters and prepare feature values. Lines 3-20 dynamically learn value functions for each intersection using the neural network. In particular, for each intersection, the trained neural network from the preceding iteration computes the value functions of the succeeding intersections, which are then used to update the value function of the current intersection (Lines 5-14). In Lines 15-16, convergence errors of all intersections are accumulated. In Lines 17-18, the average convergence error is updated to determine the loop termination. In Lines 19-20, the neural network is trained using feature values and updated labels. To find the optimal path before departure (which is preferred in a real application), the best action at a certain intersection with a given deadline is determined by: a (s)=arg max a P s s,a[V (s )+ R(s )] , (6) where P s s,a is represented by travel time data on corresponding road link. The next state s (which consists of v and τ ) can be decided as follows: v is the succeeding intersection, and τ = τ E(tv,v ), where E(tv,v ) is the expected travel time over the previously chosen road link. Then we iteratively use the same equation to compute the next action until the complete path is achieved. ... Initialize V0(s) via warm start; and set ϵ1 = 1; ϵ2 = 0; k = 0; N = 1000; ζ = 0.025; ... To approximate Vk( ), we adopt a two-layer dynamic neural network to learn function fk( ) in each iteration as the probability of arriving on time for a given intersection with a given deadline. Specifically, fk( z) = g2(g1( z ω1 + u1) ω2 + u2), (4) where z is the feature vector; ω1, u1, ω2, and u2 are parameters of the neural network; g1( ) and g2( ) are activation functions. The feature vector z used to represent s is defined as follows: z(s) = τ, μ(v), σ(v) , where τ and v are the time-todeadline and location in s; μ and σ are the means and standard deviations of travel time for the K-shortest paths from current location to destination.