reproducibilityindex.ai

Towards Automatic Learning of Procedures From Web Instructional Videos

Authors: Luowei Zhou, Chenliang Xu, Jason Corso

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show in our experiments that the proposed model outperforms competitive baselines in procedure segmentation. For evaluation, we compare variants of our model with competitive baselines on standard metrics and the proposed methods demonstrate top performance against baselines.
Researcher Affiliation	Academia	Luowei Zhou Robotics Institute University of Michigan luozhou@umich.edu Chenliang Xu Department of CS University of Rochester Chenliang.Xu@rochester.edu Jason J. Corso Department of EECS University of Michigan jjcorso@eecs.umich.edu
Pseudocode	No	The paper describes the model architecture and procedures in detail using text and diagrams, but it does not include a structured pseudocode block or algorithm.
Open Source Code	No	The paper does not provide an unambiguous statement or a direct link to the source code for the methodology described in this paper. It only links to the dataset and a third-party ResNet implementation.
Open Datasets	Yes	Our new dataset, called You Cook21, contains 2000 videos from 89 recipes with a total length of 176 hours. 1Dataset website: http://youcook2.eecs.umich.edu
Dataset Splits	Yes	We randomly split the dataset to 67%:23%:10% for training, validation and testing according to each recipe.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper states 'Our implementation is in Torch' but does not provide specific version numbers for Torch or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	The sizes of the temporal conv. kernels (also anchor length) are from 3 to 123 with an interval of 8, which covers 95% of the segment durations in training set. The 16 explicit anchors centered at each frame, i.e., stride for temporal conv. is 1. We randomly select U = 100 samples from all the positive and negative samples respectively and feed in negative samples if positive ones are less than U. Our implementation is in Torch. All the LSTMs have one layer and 512 hidden units. For hyperparameters, the learning rate is 4 10 5. We use the Adam optimizer (Kingma and Ba 2014) for updating weights with α = 0.8 and β = 0.999. Note that we disable the CNN fine-tuning which heavily slows down the training process.