Learning Exploration Policies for Navigation
Authors: Tao Chen, Saurabh Gupta, Abhinav Gupta
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted our experiments on the House3D simulation environment Wu et al. (2018). House 3D is based on realistic apartment layouts from the SUNCG dataset Song et al. (2017) and simulates first-person observations and actions of a robotic agent embodied in these apartments. We use 20 houses each for training and testing. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Facebook AI Research |
| Pseudocode | No | The paper describes the methods in text and figures (like Figure 1 for policy architecture) but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We conducted our experiments on the House3D simulation environment Wu et al. (2018). House 3D is based on realistic apartment layouts from the SUNCG dataset Song et al. (2017) |
| Dataset Splits | No | The paper states 'We use 20 houses each for training and testing' and discusses training on 'training houses' and testing on 'testing houses', but it does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'Res Net18 CNN', 'Image Net', 'PPO', and 'Adam optimizer', but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Each episode is run for 500 time-steps. We run a total of 6400 episodes which amounts to a total of 3.2M steps of experience... Coefficient α for the coverage reward Rcov int(t) is 0.0005, coefficient β for Rcoll int(t) is 0.006. PPO entropy loss coefficient is 0.01. Network is optimized via Adam optimizer with a learning rate of 0.00001. |