Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos

Authors: Yufan Zhou, Zhaobo Qi, Lingshuai Lin, Junqi Jing, Tingting Chai, Beichen Zhang, Shuhui Wang, Weigang Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across three widely used benchmark datasets demonstrate that our MTID achieves promising action planning performance on most metrics. The code is available at https://github.com/Wiser Zhou/MTID. 4 EXPERIMENTS
Researcher Affiliation	Academia	1Harbin Institute of Technology, Weihai 2Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods and processes using text and figures (Figure 2: Overview of our Masked Temporal Interpolation Diffusion; Figure 3: Latent space temporal interpolation module & Residual temporal block & cross-attention module) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Experimental results across three widely used benchmark datasets demonstrate that our MTID achieves promising action planning performance on most metrics. The code is available at https://github.com/Wiser Zhou/MTID.
Open Datasets	Yes	We evaluate our MTID method on three instructional video datasets: Cross Task (Zhukov et al., 2019), COIN (Tang et al., 2019), and NIV (Alayrac et al., 2016).
Dataset Splits	Yes	We randomly split each dataset into training (70% of videos per task) and testing (30%), following previous works (Sun et al., 2022; Wang et al., 2023b; Niu et al., 2024).
Hardware Specification	Yes	Training is performed using ADAM (Kingma, 2014) on 8 NVIDIA RTX 3090 GPUs.
Software Dependencies	No	The paper mentions using ADAM (Kingma, 2014) as the optimizer but does not specify version numbers for other software dependencies like programming languages or libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	For the Cross Task dataset, we set the diffusion steps to 250 and train for 20,000 steps. The learning rate is linearly increased to 5 × 10−4 over the first 3,333 steps, then halved at steps 8,333, 13,333, and 18,333. For the NIV dataset, with 50 diffusion steps, training lasts for 5,000 steps. The learning rate ramps up to 3 × 10−4 over the first 1,000 steps and is reduced by 50% at steps 2,666 and 4,332. In the larger COIN dataset, we use 300 diffusion steps and train for 30,000 steps. The learning rate increases to 1 × 10−5 in the first 5,000 steps and is halved at steps 12,500, 20,000, and 27,500, stabilizing at 2.5 × 10−6 for the remaining steps. Training is performed using ADAM (Kingma, 2014) on 8 NVIDIA RTX 3090 GPUs.