Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
Authors: Shani Gamrian, Yoav Goldberg
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate this approach on Breakout and Road Fighter in Section 5, and present the results comparing to different baselines. |
| Researcher Affiliation | Collaboration | 1Computer Science Department, Bar-Ilan University, Ramat Gan, Israel 2Allen Institute for Artificial Intelligence. |
| Pseudocode | Yes | Algorithm 1 Imitation Learning |
| Open Source Code | Yes | The code is available at https://github.com/Shani Gam/RL-GAN. |
| Open Datasets | Yes | In this work, we first focus on the Atari game Breakout, in which the main concept is moving the paddle towards the ball in order to maximize the score of the game. We explore the Nintendo game Road Fighter, a car racing game where the goal is to finish the track before the time runs out without crashing. |
| Dataset Splits | No | The paper describes collecting images and training for a number of iterations but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not specify any particular GPU, CPU, or other hardware models used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms (A3C, A2C) and frameworks (UNIT, Cycle GAN) but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We train each one of the tasks (before and after the transformation) for 60 million frames, and our evaluation metric is the total reward the agents collect in an episode averaged by the number of episodes... For our experiments we use the same architecture and hyperparameters proposed in the UNIT paper. We initialize the weights with Xavier initialization (Glorot & Bengio, 2010), set the batch size to 1 and train the network for a different number of iterations on each task. |