reproducibilityindex.ai

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos

Authors: Yulei Niu, Wenliang Guo, Long Chen, Xudong Lin, Shih-Fu Chang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Cross Task, COIN, and NIV benchmark datasets demonstrate that our proposed SCHEMA model achieves state-of-the-art performance and obtains explainable visualizations.
Researcher Affiliation	Academia	1Columbia University 2The Hong Kong University of Science and Technology
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code: https://github.com/Wenliang Guo/SCHEMA
Open Datasets	Yes	We evaluate our SCHEMA method on three benchmark instruction video datasets, Cross Task (Zhukov et al., 2019), and COIN (Tang et al., 2019), and NIV (Alayrac et al., 2016).
Dataset Splits	No	Following previous works (Chang et al., 2020; Bi et al., 2021; Sun et al., 2022), we randomly select 70% of the videos in each task as the training set and take the others as the test set. A separate validation split percentage or sample count is not explicitly provided.
Hardware Specification	Yes	The training process takes 1 hour (500 epochs) on Cross Task and 5.5 hours (400 epochs) on COIN using a single V100 GPU.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer', 'CLIP', 'S3D network', and 'GPT-3.5', but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We train our model with Adam optimizer, an initial learning rate set to 5e-3 decayed by 0.65 every 40 epochs. The batch size is set as 256. Each self-attention and cross-attention module consists of 32 heads and the hidden layer size is set as 128. The step classifier is a two-layer MLP with hidden size of 128. The dropout ratio is 0.2.