Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
Authors: Albert Yu, Ray Mooney
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS |
| Researcher Affiliation | Academia | Albert Yu UT Austin EMAIL Raymond J. Mooney UT Austin EMAIL |
| Pseudocode | Yes | Algorithm 1 De L-Ta Co: Training |
| Open Source Code | Yes | We link to our open-sourced codebase on our project website, https://deltaco-robot.github.io. |
| Open Datasets | Yes | We develop a Pybullet (Coumans & Bai, 2007-2022) simulation environment with a Widow X 250 robot arm, 32 possible objects of diverse colors and shapes for manipulation, and 2 different containers. Using a scripted policy (details in Appendix C), we collect roughly 130 successful demonstrations for each training task, and a single successful demonstration for each test task. All demonstrations are 30 timesteps long. Depending on our experimental scenario (see Section 5.2), we train on 65% to 80% of the 300 tasks, so our training buffer contains roughly 26,000-31,000 trajectories. Appendix A provides the full list of our 300 tasks, instructions, and objects, as well as train and test task splits. |
| Dataset Splits | Yes | We define a set of n tasks {Ti}n i=1 and split them into training tasks U and test tasks V , where (U, V ) is a bipartition of {Ti}n i=1. During evaluation, we assume access to a buffer Dval of trajectories for only the tasks in V and their associated natural language descriptions. Appendix A provides the full list of our 300 tasks, instructions, and objects, as well as train and test task splits. Scenario A (novel objects, colors, and shapes) trains on all gray tasks and tests on yellow , blue , and green tasks. Scenario B (novel colors and shapes) trains on all gray and yellow tasks and tests on blue and green tasks. |
| Hardware Specification | No | The paper mentions a 'Pybullet simulation environment' and the simulated 'Widow X 250 robot arm', but does not specify any actual hardware (like GPUs, CPUs, or memory) used for running the experiments or training the models. |
| Software Dependencies | No | The paper mentions software like 'Distil BERT' and 'Pybullet (Coumans & Bai, 2007-2022)', and 'Res Net-18' but does not provide specific version numbers for these or other software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | Table 5: Policy π hyperparameters, Table 6: fdemo CNN hyperparameters, Table 7: Imitation learning hyperparameters. These tables provide specific values for learning rate, batch size, number of tasks per batch, task encoder weight, contrastive learning temperature, input/output sizes, kernel sizes, strides, activation functions, and image augmentation details. |