reproducibilityindex.ai

Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos

Authors: Kyungjae Lee, Nan Duan, Lei Ji, Jason Li, Seung-won Hwang8147-8154

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental result demonstrates that our model achieves state-of-the-art performance.
Researcher Affiliation	Collaboration	Kyungjae Lee,1 Nan Duan,2 Lei Ji,2,3 Jason Li,4 Seung-won Hwang1 1Department of Computer Science, Yonsei University, Seoul, South Korea 2Microsoft Research Asia, Beijing, China 3University of Chinese Academy of Science, Beijing, China 4STCA Multimedia Group, Microsoft, Beijing, China
Pseudocode	No	The paper provides architectural descriptions and mathematical formulations for its models but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to an open-source detector used by the authors ('Using opensource detector 1, we extract object categories c from images in clip Tk,' with footnote '1https://github.com/peteanderson80/bottom-up-attention'), but does not explicitly state that the code for their own proposed methodology ('Segmenter-Ranker') is publicly available.
Open Datasets	No	The paper states, 'For training and evaluating this task, we collect labelled resources of 37K QA pairs and 21K video (total 1,662 hours)' and 'For such purpose, we contribute a labeled dataset of 37K QA pairs on instructional videos for benchmarking,' but does not provide a specific link, DOI, or repository for public access to this dataset.
Dataset Splits	Yes	We divide the dataset into 29K/4k/4k as training/dev/test set respectively, where the videos do not overlap in each set.
Hardware Specification	No	The paper discusses computational expense related to certain features (e.g., ResNet-50/101 features having large dimensions) but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its own experiments.
Software Dependencies	No	The paper mentions using a 'base version of BERT (Devlin et al. 2018) with 12 layers' and the 'Adam optimizer', but it does not specify concrete version numbers for the software libraries (e.g., TensorFlow, PyTorch) or specific BERT model versions used, which are necessary for reproducible dependency descriptions.
Experiment Setup	Yes	We use a base version of BERT (Devlin et al. 2018) with 12 layers as our encoder, following its default setting. We train our model on BERT until 3 epochs, and use the Adam optimizer with a learning rate of 0.00005. In Segmenter, we extract N = 9 span candidates, from the output probabilities. In Ranker, training data has 1:9 positive and negative ratio, then this module ranks top 9 candidates at inference time. For CNN layer, the number lf of layers is 30, and top nt = 7 elements in max-pooling are extracted, which are optimized on dev set. For detecting image objects, we sample frames as 1 fps in videos.