Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DramaQA: Character-Centered Video Story Understanding with Hierarchical QA

Authors: Seongho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang1166-1174

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we discuss an ablation study to analyze the model s characterstics profoundly. Table 2 shows the quantitative results of the ablation study for our model, and we described our experimental settings and implementation details in the Appendix C. QA Similarity is a simple baseline model designed to choose the highest score on the cosine similarity between the average of question s word embeddings and the average of candidate answer s word embeddings. The overall test accuracy of Our(Full) was 71.14% but the performance of each difﬁculty level varies.
Researcher Affiliation	Academia	Seongho Choi,1 Kyoung-Woon On,1 Yu-Jung Heo,1 Ahjeong Seo,1 Youwon Jang,1 Minsu Lee,1 Byoung-Tak Zhang1,2 1 Seoul National University 2 AI Institute (AIIS) EMAIL
Pseudocode	No	The paper describes the model architecture and mathematical formulations (e.g., equations 1-4) but does not include a distinct block labeled "Pseudocode" or "Algorithm" with structured steps.
Open Source Code	Yes	We release our dataset and model publicly for research purposes2, and we expect our work to provide a new perspective on video story understanding research. 2https://dramaqa.snu.ac.kr
Open Datasets	Yes	Our dataset is built upon the TV drama Another Miss Oh 1 and it contains 17,983 QA pairs from 23,928 various length video clips... We provide 217,308 annotated images with rich charactercentered annotations... We release our dataset and model publicly for research purposes2, and we expect our work to provide a new perspective on video story understanding research. 2https://dramaqa.snu.ac.kr
Dataset Splits	No	The paper mentions "Table 2 shows the quantitative results of the ablation study for our model, and we described our experimental settings and implementation details in the Appendix C." While a test split is clearly indicated, the main text does not specify details about training and validation splits (e.g., percentages or counts), referring instead to an appendix not provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments. It refers to "experimental settings and implementation details in the Appendix C" but Appendix C is not available in the provided text.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used in the experiments. It refers to "experimental settings and implementation details in the Appendix C" but Appendix C is not available in the provided text.
Experiment Setup	No	The paper states, "we described our experimental settings and implementation details in the Appendix C." However, these details, such as hyperparameter values, model initialization, or specific training configurations, are not present in the main text of the paper.