Asynchronous Distributed Variational Gaussian Process for Regression

Authors: Hao Peng, Shandian Zhe, Xiao Zhang, Yuan Qi

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments, Table 1. Root mean square errors (RMSEs) for 700K/100K US Flight data., Figure 1. Root mean square errors (RMSEs) for US flight data as a function of training time.
Researcher Affiliation Collaboration 1Purdue University, West Lafayette, IN, USA 2Ant Financial Service Group.
Pseudocode Yes Algorithm 1 Delayed Proximal Gradient for ADVGP
Open Source Code No The paper does not provide any links to source code or explicit statements about releasing code for the described methodology.
Open Datasets Yes We used the US Flight data1 (Hensman et al., 2013), which recorded the arrival and departure time of the USA commercial flights between January and April in 2008. ... 1http://stat-computing.org/dataexpo/2009/ and We used the New York city yellow taxi trip dataset 2, which consist of 1.21 billions of trip records from January 2009 to December 2015. ... 2http://www.nyc.gov/html/tlc/html/about/ trip_record_data.shtml
Dataset Splits Yes in the first group, we randomly chose 700K samples for training; in the second group, we randomly selected 2M training samples. Both groups used 100K samples for testing. We ensured that the training and testing data are non-overlapping. To choose an appropriate delay τ, we sampled another set of training and test data, based on which we tuned τ from {0, 8, 16, 24, 32, 40}. These tunning datasets do not overlap the test data in the evaluation.
Hardware Specification Yes We ran all the methods on a computer node with 16 CPU cores and 64 GB memory. We conducted two experiments on 4 c4.8xlarge instances of Amazon EC2 cloud. We used Amazon EC2 cloud, and ran ADVGP on multiple Amazon c4.8xlarge instances, each with 36 v CPUs and 60 GB memory.
Software Dependencies No The paper mentions software like Vowpal Wabbit, PARAMETERSERVER, and ADADELTA, but does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes For ADVGP, we initialized µ = 0, U = I, and used ADADELTA (Zeiler, 2012) to adjust the step size for the gradient descent before the proximal operation. To choose an appropriate delay τ, we sampled another set of training and test data, based on which we tuned τ from {0, 8, 16, 24, 32, 40}. We chose τ = 32 as it produced the best performance on the tunning datasets. We set m = 50 and initialized the inducing points as the the K-means cluster centers from a subset of 2M training samples. The delay limit τ was selected as 20.