Learning Simultaneous Navigation and Construction in Grid Worlds

Authors: Wenyu Han, Haoran Wu, Eisuke Hirota, Alexander Gao, Lerrel Pinto, Ludovic Righetti, Chen Feng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show that pre-training this position estimation module before Q-learning can significantly improve the construction performance measured by the intersection-over-union score, achieving the best results in our benchmark of various baselines including model-free and model-based RL, a handcrafted SLAM-based policy, and human players. Our code is available at: https://ai4ce.github.io/SNAC/.
Researcher Affiliation Academia Wenyu Han, Haoran Wu, Eisuke Hirota, Alexander Gao, Lerrel Pinto, Ludovic Righetti, Chen Feng New York University
Pseudocode Yes Algorithm 1 Handcrafted Policy
Open Source Code Yes Our code is available at: https://ai4ce.github.io/SNAC/.
Open Datasets No The paper states that the authors created their own dataset by collecting
Dataset Splits Yes For the variable design tasks, we randomly generated 500 ground-truth designs and split them to 8/1/1 for training/validation/testing.
Hardware Specification Yes We test each simulation environment for 500 episodes of games on Intel(R) Core(TM) i9-9920X CPU @ 3.50GHz using a single thread
Software Dependencies No The paper mentions software components like "Stable Baselines" and various RL algorithms (DQN, DRQN, PPO, Rainbow, SAC) but does not provide specific version numbers for any of these software dependencies or programming languages/frameworks used.
Experiment Setup Yes To validate the proposed framework and its robustness, all baselines are trained with the same set of 4 random seeds and averaged results are reported. ... For the constant design tasks in 1D/2D/3D, we test the trained agent for 500 times for each task... For the variable design tasks, we randomly generated 500 ground-truth designs and split them to 8/1/1 for training/validation/testing. ... DQN. ... We train DQN on each task for 3,000 episodes. Batch size is 2,000, and replay buffer size 50,000. ... DRQN. We train it for 10,000 episodes with batch size of 64 and replay memory size of 1,000. ... PPO. ... We train PPO for 10 million time steps... we chose the following values: 1e5 for the batch size, 1e2 for the number of minibatches, 2.5e-4 for the learning rate and 0.1 for the clipping threshold.