Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PlanLLM: Video Procedure Planning with Refinable Large Language Models
Authors: Dejie Yang, Zijing Zhao, Yang Liu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our Plan LLM achieves superior performance on three benchmarks, demonstrating the effectiveness of our designs. Code https://github.com/idejie/Plan LLM |
| Researcher Affiliation | Academia | 1 Wangxuan Institute of Computer Technology, Peking University 2 State Key Laboratory of General Artificial Intelligence, Peking University EMAIL, EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the main body of the paper. |
| Open Source Code | Yes | Code https://github.com/idejie/Plan LLM |
| Open Datasets | Yes | We employ three commonly used video datasets: Cross Task (Zhukov et al. 2019), NIV (Alayrac et al. 2016), and COIN (Tang et al. 2019). |
| Dataset Splits | No | The paper mentions using three commonly used video datasets: Cross Task, NIV, and COIN, but does not provide specific training/testing/validation split percentages, sample counts, or explicit references to how these datasets were partitioned for the experiments in the main text. |
| Hardware Specification | Yes | training the model with a batch size of 32 on NVIDIA A800 GPUs. |
| Software Dependencies | No | The paper mentions using S3D network, CLIP, BLIP2, Vicuna-7B, and LoRA, but does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | During the frozen LLM training stage, we set the learning rate to 1 10 4 for the Q-Former and 1 10 3 for other modules, training the model with a batch size of 32 on NVIDIA A800 GPUs. |