Modeling Deep Temporal Dependencies with Recurrent Grammar Cells""
Authors: Vincent Michalski, Roland Memisevic, Kishore Konda
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested and compared the models on sequences and videos with varying degrees of complexity, from synthetic constant to synthetic accelerated transformations to more complex real-world transformations. A description of the synthetic shift and rotation data sets is provided in the supplementary material. |
| Researcher Affiliation | Academia | Vincent Michalski Goethe University Frankfurt, Germany vmichals@rz.uni-frankfurt.de Roland Memisevic University of Montreal, Canada roland.memisevic@umontreal.ca Kishore Konda Goethe University Frankfurt, Germany konda.kishorereddy@gmail.com |
| Pseudocode | No | The paper describes the model and processes using equations and prose, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'Further results and data can be found on the project website at http://www.ccc.cs.uni-frankfurt.de/people/vincent-michalski/grammar-cells', but this is a project website/personal homepage and not a direct link to the source code repository for the methodology. The paper does not explicitly state that the code is released or provide a direct link to a code repository within its text. |
| Open Datasets | Yes | For all data sets, except for chirps and bouncing balls, PCA whitening was used for dimensionality reduction, retaining around 95% of the variance. The chirps-data was normalized by subtracting the mean and dividing by the standard deviation of the training set. For the multi-layer models we used greedy layerwise pretraining before predictive training. We found pretraining to be crucial for the predictive training to work well. Each layer was pretrained using a simple GAE, the first layer on input frames, the next layer on the inferred mappings. Stochastic gradient descent (SGD) with learning rate 0.001 and momentum 0.9 was used for all pretraining. |
| Dataset Splits | Yes | The setting with the best performance on the validation set was 256 filters and 256 mapping units for both training objectives on both data sets. |
| Hardware Specification | No | The paper describes the training process and parameters but does not specify any hardware details such as CPU or GPU models used for the experiments. |
| Software Dependencies | No | The paper mentions stochastic gradient descent (SGD) as an optimization method, but it does not specify any software dependencies, libraries, or programming language versions used for implementation. |
| Experiment Setup | Yes | The models were each trained for 1 000 epochs using SGD with learning rate 0.001 and momentum 0.9. |