Towards Automatic Learning of Procedures From Web Instructional Videos
Authors: Luowei Zhou, Chenliang Xu, Jason Corso
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in our experiments that the proposed model outperforms competitive baselines in procedure segmentation. For evaluation, we compare variants of our model with competitive baselines on standard metrics and the proposed methods demonstrate top performance against baselines. |
| Researcher Affiliation | Academia | Luowei Zhou Robotics Institute University of Michigan luozhou@umich.edu Chenliang Xu Department of CS University of Rochester Chenliang.Xu@rochester.edu Jason J. Corso Department of EECS University of Michigan jjcorso@eecs.umich.edu |
| Pseudocode | No | The paper describes the model architecture and procedures in detail using text and diagrams, but it does not include a structured pseudocode block or algorithm. |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link to the source code for the methodology described in this paper. It only links to the dataset and a third-party ResNet implementation. |
| Open Datasets | Yes | Our new dataset, called You Cook21, contains 2000 videos from 89 recipes with a total length of 176 hours. 1Dataset website: http://youcook2.eecs.umich.edu |
| Dataset Splits | Yes | We randomly split the dataset to 67%:23%:10% for training, validation and testing according to each recipe. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper states 'Our implementation is in Torch' but does not provide specific version numbers for Torch or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | The sizes of the temporal conv. kernels (also anchor length) are from 3 to 123 with an interval of 8, which covers 95% of the segment durations in training set. The 16 explicit anchors centered at each frame, i.e., stride for temporal conv. is 1. We randomly select U = 100 samples from all the positive and negative samples respectively and feed in negative samples if positive ones are less than U. Our implementation is in Torch. All the LSTMs have one layer and 512 hidden units. For hyperparameters, the learning rate is 4 10 5. We use the Adam optimizer (Kingma and Ba 2014) for updating weights with α = 0.8 and β = 0.999. Note that we disable the CNN fine-tuning which heavily slows down the training process. |