Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics
Authors: Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios. |
| Researcher Affiliation | Collaboration | 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2PRIOR @ Allen Institute for AI prior.allenai.org/projects/action-adaptive-policy |
| Pseudocode | No | The paper presents diagrams for the model architecture but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We will release the code for this modified environment and experiments as well.' in Section F, which is a future promise. It also links to the open-sourced code of a baseline ('Emb CLIP') but not its own 'AAP' methodology. |
| Open Datasets | Yes | To evaluate AAP, we train agents to complete two challenging visual navigation tasks within the AI2-THOR environment (Kolve et al., 2017): Point Navigation (Point Nav) (Deitke et al., 2020) and Object Navigation (Object Nav) (Deitke et al., 2020)1. ... We train our AAP and Emb CLIP on Proc THOR 10k (Deitke et al., 2022) for Object Nav with training drifts dm and dr, then evaluate in a real-world scene from Robo THOR (Deitke et al., 2020). |
| Dataset Splits | Yes | SR is the proportion of successful episodes over the validation set. ... The framework (w/ our AAP) spends 35 minutes evaluating 1.8k val episodes with 5 parallel processes on the Point Navigation task. |
| Hardware Specification | Yes | At runtime, we evaluate models by a personal desktop with a Intel i9-9900K CPU, 64G DDR4-3200 RAM, 2 Nvidia RTX 2080 Ti GPUs. ... During the training phase, we used an AWS machine with 48 v CPUs, 187G RAM, and 4 Nvidia Tesla T4 GPUs to train the policy. |
| Software Dependencies | Yes | To conduct the experiments in Habitat (Savva et al., 2019), we made a small change in the Habitat s Move Forward action, where every Move Forward only moves the agent by 0.01m. ... The simulator is Habitat-Lab v0.2.1. |
| Experiment Setup | Yes | During training, we use the Adam optimizer and an initial learning rate of 3e 4 that linearly decays to 0 over 75M and 300M steps for the two tasks, respectively. We set the standard RL reward discounting parameter γ to 0.99, λgae to 0.95, and number of update steps to 128 for LPPO. The α for Lforward is set to 1. ... The meta-update learning rate was set to 10 4. |