Modeling Deep Temporal Dependencies with Recurrent Grammar Cells""

Authors: Vincent Michalski, Roland Memisevic, Kishore Konda

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested and compared the models on sequences and videos with varying degrees of complexity, from synthetic constant to synthetic accelerated transformations to more complex real-world transformations. A description of the synthetic shift and rotation data sets is provided in the supplementary material.
Researcher Affiliation Academia Vincent Michalski Goethe University Frankfurt, Germany vmichals@rz.uni-frankfurt.de Roland Memisevic University of Montreal, Canada roland.memisevic@umontreal.ca Kishore Konda Goethe University Frankfurt, Germany konda.kishorereddy@gmail.com
Pseudocode No The paper describes the model and processes using equations and prose, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states: 'Further results and data can be found on the project website at http://www.ccc.cs.uni-frankfurt.de/people/vincent-michalski/grammar-cells', but this is a project website/personal homepage and not a direct link to the source code repository for the methodology. The paper does not explicitly state that the code is released or provide a direct link to a code repository within its text.
Open Datasets Yes For all data sets, except for chirps and bouncing balls, PCA whitening was used for dimensionality reduction, retaining around 95% of the variance. The chirps-data was normalized by subtracting the mean and dividing by the standard deviation of the training set. For the multi-layer models we used greedy layerwise pretraining before predictive training. We found pretraining to be crucial for the predictive training to work well. Each layer was pretrained using a simple GAE, the first layer on input frames, the next layer on the inferred mappings. Stochastic gradient descent (SGD) with learning rate 0.001 and momentum 0.9 was used for all pretraining.
Dataset Splits Yes The setting with the best performance on the validation set was 256 filters and 256 mapping units for both training objectives on both data sets.
Hardware Specification No The paper describes the training process and parameters but does not specify any hardware details such as CPU or GPU models used for the experiments.
Software Dependencies No The paper mentions stochastic gradient descent (SGD) as an optimization method, but it does not specify any software dependencies, libraries, or programming language versions used for implementation.
Experiment Setup Yes The models were each trained for 1 000 epochs using SGD with learning rate 0.001 and momentum 0.9.