Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Authors: Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, Shih-Fu Chang
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Cross Task, COIN, and NIV benchmark datasets demonstrate that our proposed SCHEMA model achieves state-of-the-art performance and obtains explainable visualizations. |
| Researcher Affiliation | Academia | 1Columbia University 2The Hong Kong University of Science and Technology |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/Wenliang Guo/SCHEMA |
| Open Datasets | Yes | We evaluate our SCHEMA method on three benchmark instruction video datasets, Cross Task (Zhukov et al., 2019), and COIN (Tang et al., 2019), and NIV (Alayrac et al., 2016). |
| Dataset Splits | No | Following previous works (Chang et al., 2020; Bi et al., 2021; Sun et al., 2022), we randomly select 70% of the videos in each task as the training set and take the others as the test set. A separate validation split percentage or sample count is not explicitly provided. |
| Hardware Specification | Yes | The training process takes 1 hour (500 epochs) on Cross Task and 5.5 hours (400 epochs) on COIN using a single V100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'CLIP', 'S3D network', and 'GPT-3.5', but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train our model with Adam optimizer, an initial learning rate set to 5e-3 decayed by 0.65 every 40 epochs. The batch size is set as 256. Each self-attention and cross-attention module consists of 32 heads and the hidden layer size is set as 128. The step classifier is a two-layer MLP with hidden size of 128. The dropout ratio is 0.2. |