Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning

Authors: David Isele, Mohammad Rostami, Eric Eaton

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of dynamical control problems.
Researcher Affiliation Academia University of Pennsylvania, Philadelphia, PA, USA
Pseudocode Yes Algorithm 1 Ta De LL (k, λ, µ); Algorithm 2 Zero-Shot Transfer to a New Task Z(tnew)
Open Source Code Yes The complete implementation of our approach is available on the third author s website.
Open Datasets No In each domain we generated 40 tasks, each with different dynamics, by varying the system parameters.
Dataset Splits Yes We chose k and the regularization parameters independently for each domain to optimize the combined performance of all methods on 20 held-out tasks, and set = mean(diag(Γ(t))) to balance the fit to the descriptors and the policies.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'Natural Actor Critic [Peters & Schaal, 2008]' and 'episodic REINFORCE [Williams, 1992]' as base learners, but these are algorithms/methods and not specific software packages with version numbers.
Experiment Setup Yes The learners sampled trajectories of 100 steps, and the learning session during each task presentation was limited to 30 iterations. We chose k and the regularization parameters independently for each domain to optimize the combined performance of all methods on 20 held-out tasks, and set = mean(diag(Γ(t))) to balance the fit to the descriptors and the policies. We measured learning curves based on the final policies for each of the 40 tasks, averaging results over seven trials.