A PAC-Bayesian bound for Lifelong Learning

Authors: Anastasia Pentina, Christoph Lampert

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate how learning priors distributions by minimizing the bounds (16), (17) and (21) can improve prediction performance in real prediction tasks. To position our results with respect to previous work on parameter and representation transfer, we compare to adaptive ridge regression (ARR), i.e. Equation (10) the prior wpr set to the average of the weight vectors from the observed tasks, and with the ELLA algorithm (Ruvolo & Eaton, 2013) that learns a subspace representation using structured sparsity constraints, also with squared loss.
Researcher Affiliation Academia Anastasia Pentina APENTINA@IST.AC.AT IST Austria (Institute of Science and Technology Austria), 3400 Am Campus 1, Klosterneuburg, Austria Christoph H. Lampert CHL@IST.AC.AT IST Austria (Institute of Science and Technology Austria), 3400 Am Campus 1, Klosterneuburg, Austria
Pseudocode No The paper describes its algorithms mathematically and textually within the main body, but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets Yes We perform experiments on three public datasets: Land Mine Detection (Xue et al., 2007). London School Data. (Argyriou et al., 2008; Kumar & Daum e III, 2012; Ruvolo & Eaton, 2013) Animals with Attributes Dataset (Lampert et al., 2013).
Dataset Splits Yes We split the data of each task into three parts: we use the first third of all tasks jointly to learn a prior. To evaluate this prior, we then train individual predictors using the second part of the data, and test their quality on the third part. ... For the baseline, we set the regularization using ordinary 3-fold cross-validation.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation or experimentation.
Experiment Setup Yes The algorithms described in Section 3.1 and 3.2 and ARR have one free parameter, the regularization strength C {10 3, . . . , 103}. We choose this using 3-fold cross-validation in the following way. We split the data of each task into three parts: we use the first third of all tasks jointly to learn a prior. To evaluate this prior, we then train individual predictors using the second part of the data, and test their quality on the third part.