GAIN: On the Generalization of Instructional Action Understanding

Authors: Junlong Li, Guangyi Chen, Yansong Tang, Jinan Bao, Kun Zhang, Jie Zhou, Jiwen Lu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide performance comparisons between the in-distribution dataset and out-of-distribution GAIN dataset, and assess the effectiveness of our causal approach on both action segmentation and action detection tasks. We conduct experiments on three datasets, where COIN (Tang et al., 2019) and Breakfast (Kuehne et al., 2014) are used for both training and testing, and our GAIN dataset is only used for evaluation.
Researcher Affiliation Academia Junlong Li1, Guangyi Chen2,3, Yansong Tang1, Jinan Bao4, Kun Zhang2,3, Jie Zhou1, Jiwen Lu1, 1Tsinghua University, 2MBZUAI, 3Carnegie Mellon University, 4 University of Alberta
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The project page is https://jun-long-li.github.io/GAIN.
Open Datasets Yes To construct an evaluation dataset consisting of diverse and high-quality daily tasks, we choose the largest fine-grained annotated dataset, COIN (Tang et al., 2019), and the widely-used instructional video dataset, Breakfast (Kuehne et al., 2014), as the training sets. We expect the introduction of the GAIN dataset will promote future in-depth research on the generalization of instructional video understanding. The project page is https://jun-long-li.github.io/GAIN.
Dataset Splits No The paper mentions training and testing datasets and their use of standard splits from existing datasets (e.g., 'follow the default setting and present results on split 1' for COIN/Breakfast), but it does not explicitly provide details about a validation set split (percentages, counts, or explicit method) for reproduction.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions various models and features used (e.g., S3D, I3D, LSTM, ED-TCN, MS-TCN++), but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes We use 5 convolution layers for both the encoder and decoder, whose convolutional filters sizes are 25. For all experiments, we employ a 1 1 convolution layer to project the features into an embedding space, whose dimension is 64. Table 5: Parameter analysis on the learning rate on COIN/GAIN-C. Learning Rate Methods Frame Accuracy COIN GAIN-C 5e-4 MS-TCN++ 62.1 49.0 Causal MS-TCN++ 64.0 52.3 1e-3 MS-TCN++ 64.7 54.3 Causal MS-TCN++ 65.5 56.2 2e-3 MS-TCN++ 65.5 51.8 Causal MS-TCN++ 65.6 56.0