Teaching a Machine to Read Maps With Deep Reinforcement Learning
Authors: Gino Brunner, Oliver Richter, Yuyi Wang, Roger Wattenhofer
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our architecture we created a training and test set of mazes with the corresponding black and white maps in the Deep Mind Lab environment. The mazes are quadratic grid mazes with each maze cell being either a wall, an open space, the target or the spawn position. The training set consists of 100 mazes of different sizes; 20 mazes each in the sizes 5x5, 7x7, 9x9, 11x11 and 13x13 maze cells. The test set consists of 900 mazes; 100 in each of the sizes 5x5, 7x7, 9x9, 11x11, 13x13, 15x15, 17x17, 19x19 and 21x21. Figure 4 shows the training performance of 8 actor threads. The trained agent is tested on the 900 test set mazes, the number of required steps per maze size are plotted in Figure 5. See Table 1 for the percentage of exits found in all maze sizes. |
| Researcher Affiliation | Academia | Gino Brunner, Oliver Richter, Yuyi Wang, Roger Wattenhofer ETH Zurich {brunnegi, richtero, yuwang, wattenhofer}@ethz.ch |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the architecture and processes using text, diagrams, and equations, but not formatted pseudocode. |
| Open Source Code | Yes | 1Our code can be found here: https://github.com/Oliver Richter/map-reader.git |
| Open Datasets | No | The paper states: "To evaluate our architecture we created a training and test set of mazes with the corresponding black and white maps in the Deep Mind Lab environment." While they used the Deep Mind Lab environment, they created their own mazes/dataset within it and do not provide concrete access information (link, DOI, citation) for *their specific dataset*. |
| Dataset Splits | Yes | The training set consists of 100 mazes of different sizes; 20 mazes each in the sizes 5x5, 7x7, 9x9, 11x11 and 13x13 maze cells. The test set consists of 900 mazes; 100 in each of the sizes 5x5, 7x7, 9x9, 11x11, 13x13, 15x15, 17x17, 19x19 and 21x21. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments. It only vaguely mentions "GPU frameworks" in the related work section. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. It mentions using the "Deep Mind Lab environment" and training methods like "A3C" and "RMSprop gradient descent", but no version numbers for these or other libraries. |
| Experiment Setup | Yes | We use 16 asynchronous agent training threads from which we start 8 on the smallest (5x5) training mazes while the other training threads are started 2 each on the other sizes (7x7, 9x9, 11x11 and 13x13). If the agent does not find the exit in 4500 steps, the episode ends as not successful. We calculate the moving average over the last 50 episodes and use 60, 100, 140, 180 and 220 steps as threshold for the maze sizes 5x5, 7x7, 9x9, 11x11 and 13x13, respectively. We also adapted an action repeat of 4 and a frame rate of 15 fps. The whole network is trained by RMSprop gradient descent with gradient back propagation stopped at module boundaries. |