Variational Predictive Routing with Nested Subjective Timescales

Authors: Alexey Zakharov, Qinghai Guo, Zafeirios Fountas

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using several video datasets, we show that VPR is able to detect event boundaries, disentangle spatiotemporal features across its hierarchy, adapt to the dynamics of the data, and produce accurate time-agnostic rollouts of the future.
Researcher Affiliation Industry Alexey Zakharov Huawei Technologies London, UK Qinghai Guo Huawei Technologies Shenzhen, China Zafeirios Fountas Huawei Technologies London, UK
Pseudocode Yes Algorithm 1: Event detection and inference in VPR for video
Open Source Code No The paper does not provide any concrete access to source code for the methodology described (e.g., no repository link or explicit statement of code release).
Open Datasets Yes 3D Shapes Dynamic (3DSD) is a dynamic extension to the 3D Shapes dataset (Burgess & Kim, 2018)... Miniworld Maze. To evaluate the behaviour of VPR in a more perceptually challenging setting, we use a 3D environment Gym-Miniworld (Chevalier-Boisvert, 2018). ...Bouncing Balls dataset, analogous to the one used in Kim et al. (2019).
Dataset Splits No The paper mentions using datasets for evaluation and provides some training parameters (e.g., batch size, learning rate), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running its experiments.
Software Dependencies No The paper mentions certain algorithms or models (e.g., Adam optimizer, GRU models, Leaky ReLU) but does not list any specific software libraries or their version numbers that would be needed to replicate the experiment setup.
Experiment Setup Yes For training, we use Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0005 and a cosine decay to 0.00005 over a period of 15,000 iterations. We employ linear annealing of the KL coefficient from 0 to 1 over the first 3000 iterations. ...Batch size of 32 is used for all datasets. ...the latent states are of size |sn t | = 20, while the temporal, top-down, and bottom-up deterministic variables are set to be |xn τ | = |cn τ | = |dn τ | = 200. ...For the Bouncing Balls dataset (see section C.3), we increase the capacity of the model, such that |xn τ | = 1024 and |sn t | = 60.