Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning
Authors: David Isele, Mohammad Rostami, Eric Eaton
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of dynamical control problems. |
| Researcher Affiliation | Academia | University of Pennsylvania, Philadelphia, PA, USA |
| Pseudocode | Yes | Algorithm 1 Ta De LL (k, λ, µ); Algorithm 2 Zero-Shot Transfer to a New Task Z(tnew) |
| Open Source Code | Yes | The complete implementation of our approach is available on the third author s website. |
| Open Datasets | No | In each domain we generated 40 tasks, each with different dynamics, by varying the system parameters. |
| Dataset Splits | Yes | We chose k and the regularization parameters independently for each domain to optimize the combined performance of all methods on 20 held-out tasks, and set = mean(diag(Γ(t))) to balance the fit to the descriptors and the policies. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Natural Actor Critic [Peters & Schaal, 2008]' and 'episodic REINFORCE [Williams, 1992]' as base learners, but these are algorithms/methods and not specific software packages with version numbers. |
| Experiment Setup | Yes | The learners sampled trajectories of 100 steps, and the learning session during each task presentation was limited to 30 iterations. We chose k and the regularization parameters independently for each domain to optimize the combined performance of all methods on 20 held-out tasks, and set = mean(diag(Γ(t))) to balance the fit to the descriptors and the policies. We measured learning curves based on the final policies for each of the 40 tasks, averaging results over seven trials. |