Large-Scale Cox Process Inference using Variational Fourier Features
Authors: ST John, James Hensman
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approximate Bayesian method can fit over 100 000 events with complex spatiotemporal patterns in three dimensions on a single GPU. We demonstrate this on the Porto taxi data set (Moreira-Matias et al., 2013). 5. Empirical Results We compare VFF with the inducing point approach using the Gaussian kernel6 (denoted Gauß+IP). We use synthetic 1D examples to show that Fourier features have the same expressive power as Gauß+IP and to demonstrate our methods: uncertainty prediction, MCMC, and sampled vs. mean test set likelihood. Real World Data: Porto Taxi Pickups |
| Researcher Affiliation | Industry | 1PROWLER.io, 66-68 Hills Road, Cambridge CB2 1LA, United Kingdom. Correspondence to: ST John <st@prowler.io>, James Hensman <james@prowler.io>. |
| Pseudocode | No | The paper contains mathematical derivations and equations but no explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We demonstrate this on the Porto taxi data set (Moreira-Matias et al., 2013). This contains 1.7 x 10^6 trajectories covering the entire year from 1 July 2013 to 30 June 2014. As the training set we took 100 weekdays, containing 113 020 events in total. |
| Dataset Splits | No | The paper mentions training and test sets but does not specify explicit dataset splits (e.g., percentages or exact counts for training, validation, and test portions) needed for reproducibility. It uses different 'training sets' (e.g., 'single day', '100 days') and refers to 'test set likelihoods' but without a clearly defined, consistent split ratio. |
| Hardware Specification | Yes | Optimizing the model for 20 000 gradient steps took ca. 24–28 h on a Tesla P100 GPU. |
| Software Dependencies | No | The paper discusses various techniques and models (e.g., Gaussian processes, MCMC, Fourier features) but does not provide specific version numbers for any software libraries, programming languages, or tools used. |
| Experiment Setup | Yes | For variational inference, hyperparameters are initialized as in section 4.4. Constant offset β. We can initialize β = λ from the mean rate... Kernel lengthscale. We can obtain a good starting point for the lengthscale hyperparameters... Kernel variance. ...we can initialize the variance σ^2 ≈ β^2λ. For MCMC, we put Gamma priors on the hyperparameters... Variational approximate distribution. ...initialize the mean m and covariance S... We chose 35 × 35 frequencies for the spatial dimensions, and a periodic kernel based on the Matérn-5/2 spectrum as described in section 4.1 with 25 frequencies for the time dimension. In each case, we consider multiple optimizations from different initial values for kernel variance, σ^2 ∈ { √λ, 1/2 √λ}, and constant offset, β ∈ { λ^1/2, 2/3 λ^1/2}. |