Robot Fleet Learning via Policy Merging
Authors: Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks, to validate the efficacy of FLEET-MERGE on the benchmark. |
| Researcher Affiliation | Academia | Lirui Wang1, Kaiqing Zhang2, Allan Zhou3, Max Simchowitz1, Russ Tedrake1 1MIT CSAIL 2University of Maryland, College Park 3Stanford |
| Pseudocode | Yes | Algorithm 1 FLEET-MERGE: Fleet Learning of Policies via Weight Merging |
| Open Source Code | Yes | Code is available at https://github.com/liruiw/Fleet-Tools. |
| Open Datasets | Yes | Meta-World (Yu et al., 2020), which has 50 distinct robotic manipulation tasks, and linear control. and Specifically, we split the MNIST dataset into N local datasets, with Dirichlet parameters α to create data non-IIDness (see ?? for more details), and we use L-layer MLPs to parameterize the models. ...as well as CNN on CIFAR-10 dataset (Krizhevsky et al., 2009) (sub-figure c). and RNN on Shakespear dataset from LEAF (Caldas et al., 2018) |
| Dataset Splits | No | The paper discusses dataset heterogeneity and how local datasets are created (e.g., using Dirichlet distributions for non-IID data), and refers to "test time", but it does not explicitly provide information on train/validation/test dataset splits with percentages, sample counts, or clear methodologies for splitting for validation. |
| Hardware Specification | No | The paper only mentions "MIT Supercloud for providing computing cluster resources for running the experiments," which is too general and lacks specific hardware details like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions software like "Drake" and "Ray library", and solvers like "SNOPT", but it does not provide specific version numbers for any of these components, which is required for reproducibility. |
| Experiment Setup | Yes | We train the policy using Adam optimizer (Kingma & Ba, 2014) with 200 epochs. The dataset contains 50000 data points for each tool instance, and the batch size is 512. We use 5 users across the merging experiments. and We use Adam optimizer with a learning rate of 1e 3 and batch size of 256. We use 3, 5, and 10 users respectively for the single-shot merging, merging while training, and merging with participation ratio experiments in Figure 10. |