A polar prediction model for learning to represent visual transformations
Authors: Pierre-Étienne Fiquet, Eero Simoncelli
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When trained on natural video datasets, our framework achieves better prediction performance than traditional motion compensation and rivals conventional deep networks, while maintaining interpretability and speed. Prediction results on the DAVIS dataset [16] are summarized in Table 1. |
| Researcher Affiliation | Academia | 1 Center for Neural Science, New York University 2 Center for Computational Neuroscience, Flatiron Institute {pef246, eero.simoncelli}@nyu.edu |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there code-like formatted steps for procedures. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To train, test and compare these models, we use the DAVIS dataset [16], which was originally designed as a benchmark for video object segmentation. We also consider a smaller video dataset consisting in footage of animals in the wild [48] which contains a variety of motions... |
| Dataset Splits | No | The set is subdivided into 60 training videos (4741 frames) and 30 test videos (2591 frames). While the paper mentions 'test loss plateaus', it does not explicitly provide specific details for a separate validation split (percentages, sample counts, or predefined splits for validation data). |
| Hardware Specification | Yes | Training and inference time are computed on a NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer [49]' but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or the programming language used for implementation. |
| Experiment Setup | Yes | We train each model for 200 epochs on DAVIS using the Adam optimizer [49] with default parameters and a learning rate of 3 × 10−4. We train on brief temporal segments containing 11 frames (which allows for prediction of 9 frames), and process these in batches of size 4. |