Long-Term Image Boundary Prediction

Authors: Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our CMSC model on natural video sequences involving agent-based motion and billiard sequences with only physics-based motion. We compare with various baselines and perform ablation studies to confirm design choices.
Researcher Affiliation Academia Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz Max Planck Institute for Informatics Saarland Informatics Campus, Saarbr ucken, Germany {abhattac, mmalinow, schiele, mfritz}@mpi-inf.mpg.de
Pseudocode No No pseudocode or algorithm blocks were found in the paper. The model architecture and components are described in text and through diagrams.
Open Source Code No No explicit statement about the release or availability of source code for the described methodology was found.
Open Datasets Yes We use the VSB100 dataset which contains 101 videos with a maximum 121 frames each. The training set consists of 40 videos and the test set consists of 60 videos. ... Similarly we randomly select 1000, 500 (training) and 1000 (test) videos from UCF101.
Dataset Splits No The paper specifies training and test sets but does not explicitly mention a separate validation set or its split details.
Hardware Specification Yes On the Nvidia Titan X GPU, our CMSC model takes approximately 16 hours to train on the VSB100 and real billiards datasets and 10 hours on synthetic billiards (1 ball) dataset.
Software Dependencies No The paper mentions the use of "pygame" for synthetic data generation and the "ADAM optimizer", but does not provide specific version numbers for these or any other software dependencies, libraries, or frameworks used.
Experiment Setup Yes We use L2 loss (mean square error) during training, which we optimize using the ADAM optimizer. ... We convert each video into 32 32 pixel patches. The CMSC model observes a central patch and eight neighbouring patches resulting in a context of size 96 96 pixels. ... We use four levels, with scales increasing by a factor of two. ... Each level of the model consists of five sets of two convolutional layers. There are 32, 64, 128, 64 and 32 filters respectively in each set, of a constant size 3 3. ... We introduce moderate 2 2 pooling layer after the first two sets of convolutional layers... We use Re LU non-linearities between every layer expect the last. We use the tanh nonlinearity at the end to ensure output in the range [0,1]. ... To deal with deceleration, we experiment with increasing the number of input frames. We train our CMSC model with six input frames and pre-train on our synthetic one ball training set.