A polar prediction model for learning to represent visual transformations

Authors: Pierre-Étienne Fiquet, Eero Simoncelli

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When trained on natural video datasets, our framework achieves better prediction performance than traditional motion compensation and rivals conventional deep networks, while maintaining interpretability and speed. Prediction results on the DAVIS dataset [16] are summarized in Table 1.
Researcher Affiliation Academia 1 Center for Neural Science, New York University 2 Center for Computational Neuroscience, Flatiron Institute {pef246, eero.simoncelli}@nyu.edu
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there code-like formatted steps for procedures.
Open Source Code No The paper does not explicitly state that source code for the described methodology is available, nor does it provide a link to a code repository.
Open Datasets Yes To train, test and compare these models, we use the DAVIS dataset [16], which was originally designed as a benchmark for video object segmentation. We also consider a smaller video dataset consisting in footage of animals in the wild [48] which contains a variety of motions...
Dataset Splits No The set is subdivided into 60 training videos (4741 frames) and 30 test videos (2591 frames). While the paper mentions 'test loss plateaus', it does not explicitly provide specific details for a separate validation split (percentages, sample counts, or predefined splits for validation data).
Hardware Specification Yes Training and inference time are computed on a NVIDIA A100 GPU.
Software Dependencies No The paper mentions using the 'Adam optimizer [49]' but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or the programming language used for implementation.
Experiment Setup Yes We train each model for 200 epochs on DAVIS using the Adam optimizer [49] with default parameters and a learning rate of 3 × 10−4. We train on brief temporal segments containing 11 frames (which allows for prediction of 9 frames), and process these in batches of size 4.