Online Multi-Task Gradient Temporal-Difference Learning
Authors: Vishnu Sreenivasan, Haitham Bou Ammar, Eric Eaton
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our preliminary results on controlling different mountain car tasks demonstrates that GTD-ELLA significantly improves learning over standard GTD RL. We evaluated GTD-ELLA on multiple tasks from the mountain car (MC) domain. Figure 1 shows that GTD-ELLA significantly improves RL performance when training on new tasks. |
| Researcher Affiliation | Academia | Vishnu Purushothaman Sreenivasan and Haitham Bou Ammar and Eric Eaton University of Pennsylvania Computer and Information Science Department {visp, haithamb, eeaton}@seas.upenn.edu |
| Pseudocode | No | The paper provides mathematical formulations but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any information or links regarding the availability of its source code. |
| Open Datasets | No | The paper describes generating 75 tasks by randomizing the valley's slope within the mountain car domain, but does not provide concrete access information (link, DOI, formal citation, or repository) for these specific generated tasks or the dataset used. |
| Dataset Splits | No | The paper mentions training on a certain number of tasks (10, 30, or 50) and evaluating on 25 unobserved tasks, but it does not explicitly describe a separate validation set or split. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computer specifications used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, frameworks with versions) used for the experiments. |
| Experiment Setup | Yes | The state is given by the position and the velocity of the car, which was represented by 6 radial basis functions that were linearly spaced across both dimensions. The position was bounded between 1.2 and 0.6, while the velocity was bounded between 0.07 and 0.07. Rewards of 1 were given in all states, with exception of the goal which gave a reward of 0. We trained GTD-ELLA on different numbers of tasks to learn L, observing either 10, 30, or 50 tasks to learn the latent bases. |