Multi-skill Mobile Manipulation for Object Rearrangement
Authors: Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines. |
| Researcher Affiliation | Collaboration | Jiayuan Gu1, Devendra Singh Chaplot2, Hao Su1, Jitendra Malik2,3 1UC San Diego, 2Meta AI Research, 3UC Berkeley |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes: https://github.com/Jiayuan-Gu/hab-mobile-manipulation |
| Open Datasets | Yes | We use the Replica CAD dataset and the Habitat 2.0 simulator (Szot et al., 2021) for our experiments. |
| Dataset Splits | Yes | For the rest of the 4 macro variations, we split 84 scenes into 64 scenes for training and 20 scenes to evaluate the generalization of unseen configurations (object and goal positions). For each task, we generate 6400 episodes (64 scenes) for training, 100 episodes (20 scenes) to evaluate cross-configuration generalization, and another 100 episodes (the hold-out macro variation) to evaluate cross-layout generalization. |
| Hardware Specification | No | The paper mentions: "The robot is a Fetch (Robotics, 2022) mobile manipulator with a 7-Do F arm and a parallel-jaw gripper." However, it does not specify the computing hardware (e.g., GPU, CPU models, RAM) used for running the experiments. |
| Software Dependencies | No | The paper mentions: "Our PPO implementation is based on the habitat-lab." and "The visual observations are encoded by a 3-layer CNN as in Szot et al. (2021).". While it mentions software components, it does not provide specific version numbers for these software dependencies (e.g., habitat-lab version, Python version, specific deep learning framework versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | Hyper-parameters: We train each skill by the PPO (Schulman et al., 2017) algorithm. The visual observations are encoded by a 3-layer CNN as in Szot et al. (2021). The visual features are concatenated with state observations and previous action, followed by a 1-layer GRU and linear layers to output action and value. Each skill is trained with 3 different seeds. See Appendix C.1 for details. Metrics: Each HAB task consists of a sequence of subtasks to accomplish, as illustrated in Sec 3.3. The completion of a subtask is conditioned on the completion of its preceding subtask. We report progressive completion rates of subtasks, and the completion rate of the last subtask is thus the success rate of the full task. For each evaluation episode, the robot is initialized at a random base position and orientation without collision, and its arm is initialized at the resting position. The completion rate is averaged over 9 different runs 5. Appendix C.1 provides more specific details: "The coefficients of value and entropy losses are 0.5 and 0 respectively. We use 64 parallel environments and collect 128 transitions per environment to update the networks. We use 2 mini-batches, 2 epochs per update, and a clipping parameter of 0.2 for both policy and value. The gradient norm is clipped at 0.5. We use the Adam optimizer with a learning rate of 0.0003. The linear learning rate decay is enabled. The mean of the Gaussian action predicted by the policy network is activated by tanh. The (log) standard deviation of the Gaussian action, which is an input-independent parameter, is initialized as 1.0." |