Robot Fleet Learning via Policy Merging

Authors: Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, FLEET-TOOLS, for fleet policy learning in compositional and contact-rich robot manipulation tasks, to validate the efficacy of FLEET-MERGE on the benchmark.
Researcher Affiliation Academia Lirui Wang1, Kaiqing Zhang2, Allan Zhou3, Max Simchowitz1, Russ Tedrake1 1MIT CSAIL 2University of Maryland, College Park 3Stanford
Pseudocode Yes Algorithm 1 FLEET-MERGE: Fleet Learning of Policies via Weight Merging
Open Source Code Yes Code is available at https://github.com/liruiw/Fleet-Tools.
Open Datasets Yes Meta-World (Yu et al., 2020), which has 50 distinct robotic manipulation tasks, and linear control. and Specifically, we split the MNIST dataset into N local datasets, with Dirichlet parameters α to create data non-IIDness (see ?? for more details), and we use L-layer MLPs to parameterize the models. ...as well as CNN on CIFAR-10 dataset (Krizhevsky et al., 2009) (sub-figure c). and RNN on Shakespear dataset from LEAF (Caldas et al., 2018)
Dataset Splits No The paper discusses dataset heterogeneity and how local datasets are created (e.g., using Dirichlet distributions for non-IID data), and refers to "test time", but it does not explicitly provide information on train/validation/test dataset splits with percentages, sample counts, or clear methodologies for splitting for validation.
Hardware Specification No The paper only mentions "MIT Supercloud for providing computing cluster resources for running the experiments," which is too general and lacks specific hardware details like CPU/GPU models or memory.
Software Dependencies No The paper mentions software like "Drake" and "Ray library", and solvers like "SNOPT", but it does not provide specific version numbers for any of these components, which is required for reproducibility.
Experiment Setup Yes We train the policy using Adam optimizer (Kingma & Ba, 2014) with 200 epochs. The dataset contains 50000 data points for each tool instance, and the batch size is 512. We use 5 users across the merging experiments. and We use Adam optimizer with a learning rate of 1e 3 and batch size of 256. We use 3, 5, and 10 users respectively for the single-shot merging, merging while training, and merging with participation ratio experiments in Figure 10.