A PAC-Bayesian bound for Lifelong Learning
Authors: Anastasia Pentina, Christoph Lampert
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate how learning priors distributions by minimizing the bounds (16), (17) and (21) can improve prediction performance in real prediction tasks. To position our results with respect to previous work on parameter and representation transfer, we compare to adaptive ridge regression (ARR), i.e. Equation (10) the prior wpr set to the average of the weight vectors from the observed tasks, and with the ELLA algorithm (Ruvolo & Eaton, 2013) that learns a subspace representation using structured sparsity constraints, also with squared loss. |
| Researcher Affiliation | Academia | Anastasia Pentina APENTINA@IST.AC.AT IST Austria (Institute of Science and Technology Austria), 3400 Am Campus 1, Klosterneuburg, Austria Christoph H. Lampert CHL@IST.AC.AT IST Austria (Institute of Science and Technology Austria), 3400 Am Campus 1, Klosterneuburg, Austria |
| Pseudocode | No | The paper describes its algorithms mathematically and textually within the main body, but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We perform experiments on three public datasets: Land Mine Detection (Xue et al., 2007). London School Data. (Argyriou et al., 2008; Kumar & Daum e III, 2012; Ruvolo & Eaton, 2013) Animals with Attributes Dataset (Lampert et al., 2013). |
| Dataset Splits | Yes | We split the data of each task into three parts: we use the first third of all tasks jointly to learn a prior. To evaluate this prior, we then train individual predictors using the second part of the data, and test their quality on the third part. ... For the baseline, we set the regularization using ordinary 3-fold cross-validation. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation or experimentation. |
| Experiment Setup | Yes | The algorithms described in Section 3.1 and 3.2 and ARR have one free parameter, the regularization strength C {10 3, . . . , 103}. We choose this using 3-fold cross-validation in the following way. We split the data of each task into three parts: we use the first third of all tasks jointly to learn a prior. To evaluate this prior, we then train individual predictors using the second part of the data, and test their quality on the third part. |