Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos
Authors: Kyungjae Lee, Nan Duan, Lei Ji, Jason Li, Seung-won Hwang8147-8154
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental result demonstrates that our model achieves state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Kyungjae Lee,1 Nan Duan,2 Lei Ji,2,3 Jason Li,4 Seung-won Hwang1 1Department of Computer Science, Yonsei University, Seoul, South Korea 2Microsoft Research Asia, Beijing, China 3University of Chinese Academy of Science, Beijing, China 4STCA Multimedia Group, Microsoft, Beijing, China |
| Pseudocode | No | The paper provides architectural descriptions and mathematical formulations for its models but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to an open-source detector used by the authors ('Using opensource detector 1, we extract object categories c from images in clip Tk,' with footnote '1https://github.com/peteanderson80/bottom-up-attention'), but does not explicitly state that the code for their own proposed methodology ('Segmenter-Ranker') is publicly available. |
| Open Datasets | No | The paper states, 'For training and evaluating this task, we collect labelled resources of 37K QA pairs and 21K video (total 1,662 hours)' and 'For such purpose, we contribute a labeled dataset of 37K QA pairs on instructional videos for benchmarking,' but does not provide a specific link, DOI, or repository for public access to this dataset. |
| Dataset Splits | Yes | We divide the dataset into 29K/4k/4k as training/dev/test set respectively, where the videos do not overlap in each set. |
| Hardware Specification | No | The paper discusses computational expense related to certain features (e.g., ResNet-50/101 features having large dimensions) but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its own experiments. |
| Software Dependencies | No | The paper mentions using a 'base version of BERT (Devlin et al. 2018) with 12 layers' and the 'Adam optimizer', but it does not specify concrete version numbers for the software libraries (e.g., TensorFlow, PyTorch) or specific BERT model versions used, which are necessary for reproducible dependency descriptions. |
| Experiment Setup | Yes | We use a base version of BERT (Devlin et al. 2018) with 12 layers as our encoder, following its default setting. We train our model on BERT until 3 epochs, and use the Adam optimizer with a learning rate of 0.00005. In Segmenter, we extract N = 9 span candidates, from the output probabilities. In Ranker, training data has 1:9 positive and negative ratio, then this module ranks top 9 candidates at inference time. For CNN layer, the number lf of layers is 30, and top nt = 7 elements in max-pooling are extracted, which are optimized on dev set. For detecting image objects, we sample frames as 1 fps in videos. |