reproducibilityindex.ai

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Authors: Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method, referred to as Assemble Net, outperforms prior approaches on public video datasets, in some cases by a great margin. We obtain 58.6% m AP on Charades and 34.27% accuracy on Moments-in-Time.
Researcher Affiliation	Collaboration	Michael S. Ryoo1,2, AJ Piergiovanni1,2,3, Mingxing Tan2 & Anelia Angelova1,2 1Robotics at Google 2Google Research 3Indiana University Bloomington
Pseudocode	No	The paper describes the algorithms and processes in text and figures, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not contain an explicit statement about releasing its source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Charades Dataset. We ﬁrst test on the popular Charades dataset (Sigurdsson et al., 2016)... Moments in Time (Mi T) Dataset. The Moments in Time (Mi T) dataset (Monfort et al., 2018)... pre-training on Kinetics (Carreira & Zisserman, 2017).
Dataset Splits	Yes	Evaluation of each architecture (i.e., measuring the ﬁtness) is done by training the model for 10K iterations and then measuring its top-1 + top-5 accuracy on the validation subset.
Hardware Specification	Yes	For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video.
Software Dependencies	No	The paper mentions 'Tensor Flow' and specific algorithms but does not provide version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video. The batch size used for Charades is 128 with 128 frames per video. The base framerate we used is 12.5 fps for Mi T and 6 fps for Charades. The spatial input resolution is 224x224 during training. We used the standard Momentum Optimizer in Tensor Flow. We used a learning rate of 3.2 (for Mi T) and 25.6 (for Charades), 12k warmup iterations, and cosine decay. No dropout is used, weight decay is set to 1e-4 and label smoothing set to 0.2.