Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning
Authors: Arvi Jonnarth, Jie Zhao, Michael Felsberg
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations. |
| Researcher Affiliation | Collaboration | Arvi Jonnarth 1 2 Jie Zhao 3 Michael Felsberg 1 4 1Link oping University. 2Husqvarna Group. 3Dalian University of Technology. 4Co-affiliation: University of Kwa Zulu-Natal. Correspondence to: Arvi Jonnarth <arvi.jonnarth@liu.se>. |
| Pseudocode | No | The paper describes the agent architecture and reward function components but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code implementation can be found online1, and our contributions are summarized as follows: 1Code: https://github.com/arvijj/rl-cpp |
| Open Datasets | Yes | To evaluate our method on this CPP variation, we use Explore Bench (Xu et al., 2022), which includes six environments; loop, corridor, corner, rooms, combination 1 (rooms with corridors), and combination 2 (complex rooms with tight spaces), which can be found in Appendix B.7. |
| Dataset Splits | No | We use separate maps for evaluation that are not seen during training, see Appendix B.7 for a full list. The paper does not explicitly mention a 'validation' dataset split with specific percentages or counts for data partitioning. |
| Hardware Specification | Yes | The training time for one agent varied between 100 to 150 hours on a T4 GPU and a 6226R CPU. Cluster node 6226R CPU, T4 GPU. Laptop i5-520M CPU, No GPU. |
| Software Dependencies | No | The paper mentions using 'soft actor-critic (SAC) RL' and provides a GitHub link, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, or specific library versions). |
| Experiment Setup | Yes | We train for 8M iterations with learning rate 10 5, batch size 256, replay buffer size 5 105, discount factor γ = 0.99, and a minimal noise level. For the multi-scale maps, we use m = 4 scales with 32 32 pixel resolution, a scale factor of s = 4, and 0.0375 meters per pixel for the finest scale. We set the maximum coverage reward λarea = 1, the incremental TV reward scale λI TV = 0.2 for exploration and λI TV = 1 for lawn mowing, the collision reward Rcoll = 10, and the constant reward Rconst = 0.1. |