Lifelong Learning with Non-i.i.d. Tasks
Authors: Anastasia Pentina, Christoph H. Lampert
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For the second scenario we propose to learn an inductive bias in form of a transfer procedure. We present a generalization bound and show on a toy example how it can be used to identify a beneficial transfer algorithm. Numeric experiments confirm that by optimizing J (θ) with respect to θ one can obtain an advantageous angle: using n = 2, . . . , 11 tasks, each with m = 10 samples, we obtain an average test error of 14.2% for the (n + 1)th task. |
| Researcher Affiliation | Academia | Anastasia Pentina IST Austria Klosterneuburg, Austria apentina@ist.ac.at Christoph H. Lampert IST Austria Klosterneuburg, Austria chl@ist.ac.at |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper describes a 'toy example' with generated data ('sampled from a non-stationary environment') but does not specify a public dataset or provide access information (link, DOI, citation) for the data used in the numerical experiments. |
| Dataset Splits | No | The paper describes a sequential task-based setup where 'n' observed tasks are used to evaluate performance on a '(n+1)th' future task, but does not provide specific train/validation/test splits (e.g., percentages, counts, or stratified methods) for a single dataset within or across tasks. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | We use a one-parameter family of transfer algorithms, Aα for α R. Given sample sets Sprev and Scur, any algorithm Aα first rotates Sprev by the angle α, and then trains a linear support vector machine on the union of both sets. [...] For that we set Qi = N(wi, I2), i.e. unit variance Gaussian distributions with means wi. Similarly, we choose all reference prior distributions as unit variance Gaussian with zero mean, Pi = N(0, I2). Analogously, we set the hyper-prior P to be N(0, 10), a zero mean normal distribution with enlarged variance in order to make all reasonable rotations α lie within one standard deviation from the mean. As hyper-posteriors Q we choose N(θ, 1) and the goal of the learning is to identify the best θ. |