reproducibilityindex.ai

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

Authors: Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance. Then we show that the E3D-LSTM network also performs well on the early activity recognition to infer what is happening or what will happen after observing only limited frames of video. We present ablation studies to verify the effectiveness of all modules in the proposed E3D-LSTM model.
Researcher Affiliation	Collaboration	Yunbo Wang1, Lu Jiang2, Ming-Hsuan Yang2,3, Li-Jia Li4, Mingsheng Long1, Li Fei-Fei4 1Tsinghua University, 2Google AI, 3University of California, Merced, 4Stanford University
Pseudocode	No	The paper presents equations and diagrams illustrating the model architecture and memory transitions, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and trained models will be made available to the public.
Open Datasets	Yes	The moving MNIST dataset is constructed by randomly sampling two digits from the original MNIST dataset... The KTH action dataset (Schuldt et al., 2004) contains 25 individuals... The Taxi BJ dataset is collected from the chaotic real-world environment using GPS monitors of taxicabs Beijing... The something-something dataset (Goyal et al., 2017) is a recent benchmark for activity/action recognition (https://20bn.com/datasets/something-something).
Dataset Splits	Yes	The whole dataset has a fixed number of entries, 10,000 sequences for training, 3,000 for validation and 5,000 for test [Moving MNIST]. The something-something dataset... contains 56,769 short videos for the training set and 7,503 videos for the validation set on 41 action categories.
Hardware Specification	No	The paper states that experiments were conducted using TensorFlow and trained with the ADAM optimizer, but it does not provide any specific hardware details such as GPU or CPU models, memory, or cluster specifications.
Software Dependencies	No	All experiments are conducted using TensorFlow (Abadi et al., 2016) and trained with the ADAM optimizer (Kingma & Ba, 2015). No specific version numbers for TensorFlow or other software dependencies are provided.
Experiment Setup	Yes	All experiments are conducted using TensorFlow (Abadi et al., 2016) and trained with the ADAM optimizer (Kingma & Ba, 2015) to minimize the l1 + l2 loss over every pixel in the frame... We stack 4 E3D-LSTMs... The number of hidden state channels of each E3D-LSTM is 64. The temporal stride is set to 1... We use the architecture illustrated in Figure 1(c) as our model, which consists of 2 layers of 3D-CNN encoders, 4 layers of E3D-LSTMs, and 2 layers of 3D-CNN decoders... We set λ(i) in Equation 5 to 10 in the beginning (i = 0), and decrease it with a speed of 2e-5 per iteration, lower bounded by η = 0.1.