Online Multi-Task Gradient Temporal-Difference Learning

Authors: Vishnu Sreenivasan, Haitham Bou Ammar, Eric Eaton

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our preliminary results on controlling different mountain car tasks demonstrates that GTD-ELLA significantly improves learning over standard GTD RL. We evaluated GTD-ELLA on multiple tasks from the mountain car (MC) domain. Figure 1 shows that GTD-ELLA significantly improves RL performance when training on new tasks.
Researcher Affiliation Academia Vishnu Purushothaman Sreenivasan and Haitham Bou Ammar and Eric Eaton University of Pennsylvania Computer and Information Science Department {visp, haithamb, eeaton}@seas.upenn.edu
Pseudocode No The paper provides mathematical formulations but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any information or links regarding the availability of its source code.
Open Datasets No The paper describes generating 75 tasks by randomizing the valley's slope within the mountain car domain, but does not provide concrete access information (link, DOI, formal citation, or repository) for these specific generated tasks or the dataset used.
Dataset Splits No The paper mentions training on a certain number of tasks (10, 30, or 50) and evaluating on 25 unobserved tasks, but it does not explicitly describe a separate validation set or split.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computer specifications used for experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks with versions) used for the experiments.
Experiment Setup Yes The state is given by the position and the velocity of the car, which was represented by 6 radial basis functions that were linearly spaced across both dimensions. The position was bounded between 1.2 and 0.6, while the velocity was bounded between 0.07 and 0.07. Rewards of 1 were given in all states, with exception of the goal which gave a reward of 0. We trained GTD-ELLA on different numbers of tasks to learn L, observing either 10, 30, or 50 tasks to learn the latent bases.