Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Authors: Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy s generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. |
| Researcher Affiliation | Academia | Fanqi Lin1,2,3 Yingdong Hu1,2,3 Pingyue Sheng1 Chuan Wen1,2,3 Jiacheng You1 Yang Gao1,2,3 1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Shanghai Artificial Intelligence Laboratory |
| Pseudocode | No | The paper describes the methodologies in prose and does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://data-scaling-laws.github.io/. To further support researchers in this endeavor, we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems. |
| Open Datasets | Yes | Existing robotic manipulation datasets do not provide enough environments and objects for a single task to meet our requirements. Therefore, we opt to use the Universal Manipulation Interface (UMI) (Chi et al., 2024), a hand-held gripper, to independently collect a substantial number of demonstrations. ... we release our code, data, and models, with the hope of inspiring further efforts in this direction and ultimately leading to general-purpose robots capable of solving complex, open-world problems. |
| Dataset Splits | Yes | To evaluate the generalization performance of the policy, we exclusively test it in unseen environments or with unseen objects. ... In total, 21 policies are trained, and each is evaluated using 8 unseen objects in the same environment as the training data, with 5 trials per object. ... Each policy is evaluated in 8 unseen environments using the same object as in training, with 5 trials per environment. ... Each policy is evaluated in 8 unseen environments, using two unseen objects per environment, with 5 trials per environment. ... To calculate MSE, we collect 30 human demonstrations for each evaluation environment or object, forming the validation set. |
| Hardware Specification | Yes | Policy inference is performed on a workstation equipped with an NVIDIA 4090 GPU (24 GB VRAM). ... it takes 75 hours to complete on 8 A800 GPUs. |
| Software Dependencies | No | The paper mentions several techniques and models like Diffusion Policy, U-Net, DDIM, DINOv2, Image Net, Res Net, CLIP Vi T, ACT, and Lo RA, but it does not specify any software libraries or frameworks with explicit version numbers. |
| Experiment Setup | Yes | Specifically, the policy trained on the smallest dataset undergoes 800 epochs, totaling 5.3 104 training steps. The policy trained on the largest dataset undergoes 75 epochs, totaling 5 105 training steps, which takes 75 hours to complete on 8 A800 GPUs. ... Config Value Image observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Proprioception observation horizon 3 (Pour Water, Unplug Charger), 2 (other tasks) Action horizon 16 Observation resolution 224 224 Environment frequency 5 Optimizer Adam W Optimizer momentum β1, β2 = 0.95, 0.999 Learning rate for action diffusion model 3e-4 Learning rate for visual encoder 3e-5 Learning rate schedule cosine decay Batch size 256 Inference denoising iterations 16 Temporal ensemble steps 8 Temporal ensemble adaptation rate -0.01 |