Long-Term Image Boundary Prediction
Authors: Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our CMSC model on natural video sequences involving agent-based motion and billiard sequences with only physics-based motion. We compare with various baselines and perform ablation studies to confirm design choices. |
| Researcher Affiliation | Academia | Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz Max Planck Institute for Informatics Saarland Informatics Campus, Saarbr ucken, Germany {abhattac, mmalinow, schiele, mfritz}@mpi-inf.mpg.de |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. The model architecture and components are described in text and through diagrams. |
| Open Source Code | No | No explicit statement about the release or availability of source code for the described methodology was found. |
| Open Datasets | Yes | We use the VSB100 dataset which contains 101 videos with a maximum 121 frames each. The training set consists of 40 videos and the test set consists of 60 videos. ... Similarly we randomly select 1000, 500 (training) and 1000 (test) videos from UCF101. |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly mention a separate validation set or its split details. |
| Hardware Specification | Yes | On the Nvidia Titan X GPU, our CMSC model takes approximately 16 hours to train on the VSB100 and real billiards datasets and 10 hours on synthetic billiards (1 ball) dataset. |
| Software Dependencies | No | The paper mentions the use of "pygame" for synthetic data generation and the "ADAM optimizer", but does not provide specific version numbers for these or any other software dependencies, libraries, or frameworks used. |
| Experiment Setup | Yes | We use L2 loss (mean square error) during training, which we optimize using the ADAM optimizer. ... We convert each video into 32 32 pixel patches. The CMSC model observes a central patch and eight neighbouring patches resulting in a context of size 96 96 pixels. ... We use four levels, with scales increasing by a factor of two. ... Each level of the model consists of five sets of two convolutional layers. There are 32, 64, 128, 64 and 32 filters respectively in each set, of a constant size 3 3. ... We introduce moderate 2 2 pooling layer after the first two sets of convolutional layers... We use Re LU non-linearities between every layer expect the last. We use the tanh nonlinearity at the end to ensure output in the range [0,1]. ... To deal with deceleration, we experiment with increasing the number of input frames. We train our CMSC model with six input frames and pre-train on our synthetic one ball training set. |