Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking
Authors: Junhyuk So, Chiwoong Lee, Shinyoung Lee, Jungseul Ok, Eunhyeok Park
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our approach substantially improves GBC performance across a wide range of simulated and real-world robotic manipulation tasks. Extensive evaluations across simulated and real-world robotic environments demonstrate that our approach outperforms Vanilla Diffusion Policy by 23.25% and the state-of-the-art BID by 12.27% |
| Researcher Affiliation | Academia | 1Department of Computer Science & Engineering 2Graduate School of Artificial Intelligence POSTECH, South Korea EMAIL |
| Pseudocode | Yes | Let Aqueue denote the action chunk queue, หat:t+H ฯ(a | st) the newly predicted action chunk, and ฯ the similarity threshold. The update rule is defined as: Aqueue Aqueue.enqueue(หat+H) if cos(Aqueue[0], หa[0]) ฯ หat:t+H else, (14) where cos( ) denotes cosine similarity. At each timestep, the first action in the queue is dequeued and executed: at = Aqueue.dequeue(). |
| Open Source Code | Yes | Our code is available at https://github.com/junhyukso/SGAC. |
| Open Datasets | Yes | These include simple tasks like Push T [9], standard benchmarks from Robomimic [24], and the particularly challenging long-horizon Kitchen [25] environment. |
| Dataset Splits | No | While the paper mentions using specific numbers of episodes for evaluation and collecting 300 demonstration episodes for real-world experiments, it does not explicitly provide information on how these demonstration datasets were split into training, validation, and test sets in terms of percentages or counts for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on one A6000 GPU server with DDIM-10 Solver with 30Hz standard visuomotor control frequencies. which requires 27H with one NVIDIA RTX 6000 Ada Generation GPU and AMD Ryzen Threadripper PRO 7985WX CPU. |
| Software Dependencies | No | The paper mentions using 'DDIM-30 solver' and frameworks like 'Lerobot(Huggingface)' and 'Diffusers'. However, it does not provide specific version numbers for these software components or any other libraries like Python or PyTorch, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Hyperparameter Settings The hyperparameters used in our simulation experiments in main paper are summarized in Table. 3. Additional hyperparameter details are listed in Table. 5 |