Multi-task Learning for Aggregated Data using Gaussian Processes
Authors: Fariba Yousefi, Michael T. Smith, Mauricio Álvarez
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show examples of the model in a synthetic example, a fertility dataset and an air pollution prediction application. |
| Researcher Affiliation | Academia | Department of Computer Science, University of Sheffield {f.yousefi, m.t.smith, mauricio.alvarez}@sheffield.ac.uk |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not contain structured pseudocode or algorithm blocks with clear labels. |
| Open Source Code | Yes | The implementation is based on the GPy framework and is available on Github: https://github.com/frb-yousefi/aggregated-multitask-gp. |
| Open Datasets | Yes | a subset of the Canadian fertility dataset is used from the Human Fertility Database (HFD) 1. The dataset consists of live births statistics by year, age of mother and birth order. ...1https://www.humanfertility.org |
| Dataset Splits | Yes | For training the multi-task model, we select N1 = 200 from the 250 observations for task 1 and use all N2 = 125 for the second task. The other 50 data points for task 1 correspond to a gap in the interval [130, 180] that we use as the test set. ... The dataset was randomly split into 1640 training points and 1000 test points. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "LBFGS-B algorithm" and "the Adam optimiser, included in climin library", and that "The implementation is based on the GPy framework", but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In these examples, we use k-means clustering over the input data, with k = M, to initialise the values of the inducing inputs, Z, which are also kept fixed during optimisation. ... We used 100 fixed inducing variables and mini-batches of size 50 samples. ... We used 2000 iterations of the variational EM algorithm, with 200 evenly spaced inducing points and a fixed lengthscale of 0.75 hours. We only optimise the parameters of the coregionalisation matrix B1 R2 2 and the variance of the noise of each Gaussian likelihood. |