reproducibilityindex.ai

Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method

Authors: Zhiguang Cao, Hongliang Guo, Jie Zhang, Frans Oliehoek, Ulrich Fastenrath

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on real road networks demonstrate the signiﬁcant advantages of our method over other counterparts.
Researcher Affiliation	Collaboration	1Energy Research Institute @NTU, Nanyang Technological University, Singapore 2School of Automation Engineering, University of Electronic Science and Technology of China, China 3School of Computer Science and Engineering, Nanyang Technological University, Singapore 4Department of Computer Science, University of Liverpool, UK 5Informatics Institute, University of Amsterdam, Netherlands 6Department of Trafﬁc Information Management and Routing Optimization, BMW Group, Germany
Pseudocode	Yes	Algorithm 1: Value Function Training
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	On Beijing network, we directly use the travel time data collected from real travel trajectories of taxi from September to October 2013 (Wang, Zheng, and Xue 2014).
Dataset Splits	No	The paper describes evaluating its method on '200 o-d pairs' and discusses 'accuracy', but it does not explicitly define or mention standard training, validation, or test dataset splits for the Q-learning model or neural network training process. It mentions 'dynamically train the neural network' and 'new label to train fk+1( z)', but not distinct validation sets.
Hardware Specification	Yes	All experiments are performed on a typical PC with Intel Core i7-3540M processor and 16GB RAM.
Software Dependencies	No	The paper mentions using a 'dynamic neural network' but does not specify any software libraries or their version numbers (e.g., TensorFlow, PyTorch, scikit-learn).
Experiment Setup	Yes	Speciﬁcally, Lines 1-2 initialize parameters and prepare feature values. Lines 3-20 dynamically learn value functions for each intersection using the neural network. In particular, for each intersection, the trained neural network from the preceding iteration computes the value functions of the succeeding intersections, which are then used to update the value function of the current intersection (Lines 5-14). In Lines 15-16, convergence errors of all intersections are accumulated. In Lines 17-18, the average convergence error is updated to determine the loop termination. In Lines 19-20, the neural network is trained using feature values and updated labels. To ﬁnd the optimal path before departure (which is preferred in a real application), the best action at a certain intersection with a given deadline is determined by: a (s)=arg max a P s s,a[V (s )+ R(s )] , (6) where P s s,a is represented by travel time data on corresponding road link. The next state s (which consists of v and τ ) can be decided as follows: v is the succeeding intersection, and τ = τ E(tv,v ), where E(tv,v ) is the expected travel time over the previously chosen road link. Then we iteratively use the same equation to compute the next action until the complete path is achieved. ... Initialize V0(s) via warm start; and set ϵ1 = 1; ϵ2 = 0; k = 0; N = 1000; ζ = 0.025; ... To approximate Vk( ), we adopt a two-layer dynamic neural network to learn function fk( ) in each iteration as the probability of arriving on time for a given intersection with a given deadline. Speciﬁcally, fk( z) = g2(g1( z ω1 + u1) ω2 + u2), (4) where z is the feature vector; ω1, u1, ω2, and u2 are parameters of the neural network; g1( ) and g2( ) are activation functions. The feature vector z used to represent s is deﬁned as follows: z(s) = τ, μ(v), σ(v) , where τ and v are the time-todeadline and location in s; μ and σ are the means and standard deviations of travel time for the K-shortest paths from current location to destination.