Embodied Visual Active Learning for Semantic Segmentation
Authors: David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu2373-2383
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate the proposed models using the photorealistic Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations. We extensively evaluate the proposed models using the photorealistic Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts, even when requesting fewer annotations. We develop a battery of methods, ranging from pre-specified ones to a fully trainable deep reinforcement learning-based agent, which we evaluate extensively in the photorealistic Matterport3D environment. We perform extensive evaluation in a photorealistic 3d environment and show that a fully learnt method outperforms comparable pre-specified ones. |
| Researcher Affiliation | Collaboration | David Nilsson1,2 , Aleksis Pirinen1, Erik G artner1,2 , Cristian Sminchisescu1,2 1Department of Mathematics, Faculty of Engineering, Lund University 2Google Research {david.nilsson, aleksis.pirinen, erik.gartner, cristian.sminchisescu}@math.lth.se |
| Pseudocode | No | The paper describes the methods in text but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using third-party tools like 'RLlib reinforcement learning package', 'Open AI Gym', 'Tensor Flow', and 'PWCNet', but it does not provide an explicit statement or link to the source code for the authors' own methodology described in the paper. |
| Open Datasets | Yes | We evaluate the methods on the Matterport3D dataset (Chang et al. 2017) using the embodied agent framework Habitat (Savva et al. 2019). |
| Dataset Splits | Yes | We use the same 61, 11 and 18 houses for training, validation and testing as Chang et al. (2017). For validation and testing we use 3 and 4 starting positions per scene, respectively, so each agent is tested for a total of 33 episodes in validation and 72 episodes in testing. Hyperparameters of the learnt and pre-specified agents are tuned on the validation set. |
| Hardware Specification | Yes | Our system is implemented in Tensor Flow (Abadi et al. 2016), and it takes about 3 days to train an agent using 4 Nvidia Titan X GPUs. |
| Software Dependencies | No | The paper mentions several software components like 'PPO', 'RLlib', 'Open AI Gym', 'Tensor Flow', and 'PWCNet', but it does not provide specific version numbers for these dependencies, which would be required for reproducible ancillary software description. |
| Experiment Setup | Yes | Mini-batches of size 8, which always include the latest added labeled image, are used in training. The network is refined either until it has trained for 1, 000 iterations or until the accuracy of a mini-batch exceeds 95%. We use a standard cross-entropy loss averaged over all pixels. The segmentation network is trained using stochastic gradi-ent descent with learning rate 0.01, weight decay 10 5 and momentum 0.9. For optimization we use Adam (Kingma and Ba 2014) with batch size 512, learning rate 10 4 and discount rate 0.99. During training, each episode consists of 256 actions. The agent is trained for 4k episodes, which totals 1024k steps. The Res Net-50 feature extractor is pre-trained on Image Net (Jia Deng et al. 2009) with weights frozen during policy training. |