Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning

Authors: Arvi Jonnarth, Jie Zhao, Michael Felsberg

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show that our approach surpasses the performance of both previous RL-based approaches and highly specialized methods across multiple CPP variations.
Researcher Affiliation Collaboration Arvi Jonnarth 1 2 Jie Zhao 3 Michael Felsberg 1 4 1Link oping University. 2Husqvarna Group. 3Dalian University of Technology. 4Co-affiliation: University of Kwa Zulu-Natal. Correspondence to: Arvi Jonnarth <arvi.jonnarth@liu.se>.
Pseudocode No The paper describes the agent architecture and reward function components but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Our code implementation can be found online1, and our contributions are summarized as follows: 1Code: https://github.com/arvijj/rl-cpp
Open Datasets Yes To evaluate our method on this CPP variation, we use Explore Bench (Xu et al., 2022), which includes six environments; loop, corridor, corner, rooms, combination 1 (rooms with corridors), and combination 2 (complex rooms with tight spaces), which can be found in Appendix B.7.
Dataset Splits No We use separate maps for evaluation that are not seen during training, see Appendix B.7 for a full list. The paper does not explicitly mention a 'validation' dataset split with specific percentages or counts for data partitioning.
Hardware Specification Yes The training time for one agent varied between 100 to 150 hours on a T4 GPU and a 6226R CPU. Cluster node 6226R CPU, T4 GPU. Laptop i5-520M CPU, No GPU.
Software Dependencies No The paper mentions using 'soft actor-critic (SAC) RL' and provides a GitHub link, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, or specific library versions).
Experiment Setup Yes We train for 8M iterations with learning rate 10 5, batch size 256, replay buffer size 5 105, discount factor γ = 0.99, and a minimal noise level. For the multi-scale maps, we use m = 4 scales with 32 32 pixel resolution, a scale factor of s = 4, and 0.0375 meters per pixel for the finest scale. We set the maximum coverage reward λarea = 1, the incremental TV reward scale λI TV = 0.2 for exploration and λI TV = 1 for lawn mowing, the collision reward Rcoll = 10, and the constant reward Rconst = 0.1.