AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Authors: Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method, referred to as Assemble Net, outperforms prior approaches on public video datasets, in some cases by a great margin. We obtain 58.6% m AP on Charades and 34.27% accuracy on Moments-in-Time.
Researcher Affiliation Collaboration Michael S. Ryoo1,2, AJ Piergiovanni1,2,3, Mingxing Tan2 & Anelia Angelova1,2 1Robotics at Google 2Google Research 3Indiana University Bloomington
Pseudocode No The paper describes the algorithms and processes in text and figures, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not contain an explicit statement about releasing its source code or a link to a code repository for the described methodology.
Open Datasets Yes Charades Dataset. We first test on the popular Charades dataset (Sigurdsson et al., 2016)... Moments in Time (Mi T) Dataset. The Moments in Time (Mi T) dataset (Monfort et al., 2018)... pre-training on Kinetics (Carreira & Zisserman, 2017).
Dataset Splits Yes Evaluation of each architecture (i.e., measuring the fitness) is done by training the model for 10K iterations and then measuring its top-1 + top-5 accuracy on the validation subset.
Hardware Specification Yes For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video.
Software Dependencies No The paper mentions 'Tensor Flow' and specific algorithms but does not provide version numbers for any software dependencies or libraries.
Experiment Setup Yes For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video. The batch size used for Charades is 128 with 128 frames per video. The base framerate we used is 12.5 fps for Mi T and 6 fps for Charades. The spatial input resolution is 224x224 during training. We used the standard Momentum Optimizer in Tensor Flow. We used a learning rate of 3.2 (for Mi T) and 25.6 (for Charades), 12k warmup iterations, and cosine decay. No dropout is used, weight decay is set to 1e-4 and label smoothing set to 0.2.