SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Authors: Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, Shih-Fu Chang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Cross Task, COIN, and NIV benchmark datasets demonstrate that our proposed SCHEMA model achieves state-of-the-art performance and obtains explainable visualizations. |
| Researcher Affiliation | Academia | 1Columbia University 2The Hong Kong University of Science and Technology |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/Wenliang Guo/SCHEMA |
| Open Datasets | Yes | We evaluate our SCHEMA method on three benchmark instruction video datasets, Cross Task (Zhukov et al., 2019), and COIN (Tang et al., 2019), and NIV (Alayrac et al., 2016). |
| Dataset Splits | No | Following previous works (Chang et al., 2020; Bi et al., 2021; Sun et al., 2022), we randomly select 70% of the videos in each task as the training set and take the others as the test set. A separate validation split percentage or sample count is not explicitly provided. |
| Hardware Specification | Yes | The training process takes 1 hour (500 epochs) on Cross Task and 5.5 hours (400 epochs) on COIN using a single V100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'CLIP', 'S3D network', and 'GPT-3.5', but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train our model with Adam optimizer, an initial learning rate set to 5e-3 decayed by 0.65 every 40 epochs. The batch size is set as 256. Each self-attention and cross-attention module consists of 32 heads and the hidden layer size is set as 128. The step classifier is a two-layer MLP with hidden size of 128. The dropout ratio is 0.2. |