AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Authors: Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method, referred to as Assemble Net, outperforms prior approaches on public video datasets, in some cases by a great margin. We obtain 58.6% m AP on Charades and 34.27% accuracy on Moments-in-Time. |
| Researcher Affiliation | Collaboration | Michael S. Ryoo1,2, AJ Piergiovanni1,2,3, Mingxing Tan2 & Anelia Angelova1,2 1Robotics at Google 2Google Research 3Indiana University Bloomington |
| Pseudocode | No | The paper describes the algorithms and processes in text and figures, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Charades Dataset. We first test on the popular Charades dataset (Sigurdsson et al., 2016)... Moments in Time (Mi T) Dataset. The Moments in Time (Mi T) dataset (Monfort et al., 2018)... pre-training on Kinetics (Carreira & Zisserman, 2017). |
| Dataset Splits | Yes | Evaluation of each architecture (i.e., measuring the fitness) is done by training the model for 10K iterations and then measuring its top-1 + top-5 accuracy on the validation subset. |
| Hardware Specification | Yes | For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' and specific algorithms but does not provide version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For the Moments in Time (Mi T) dataset training, 8 videos are provided per TPU core (with 16GB memory): the total batch size (for each gradient update) is 512 with 32 frames per video. The batch size used for Charades is 128 with 128 frames per video. The base framerate we used is 12.5 fps for Mi T and 6 fps for Charades. The spatial input resolution is 224x224 during training. We used the standard Momentum Optimizer in Tensor Flow. We used a learning rate of 3.2 (for Mi T) and 25.6 (for Charades), 12k warmup iterations, and cosine decay. No dropout is used, weight decay is set to 1e-4 and label smoothing set to 0.2. |