reproducibilityindex.ai

Learning in temporally structured environments

Authors: Matt Jones, Tyler R. Scott, Mengye Ren, Gamaleldin Fathy Elsayed, Katherine Hermann, David Mayo, Michael Curtis Mozer

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Third, we evaluate the ability of these methods to handle nonstationarity by testing them in online prediction tasks characterized by 1/f noise in the latent parameters. We ﬁnd that the Bayesian model signiﬁcantly outperforms online stochastic gradient descent and two batch heuristics that rely preferentially or exclusively on more recent data. Moreover, the variational approximation performs nearly as well as the full Bayesian model, and with memory requirements that are linear in the size of the network.
Researcher Affiliation	Collaboration	Matt Jones,1,2 Tyler R. Scott,1 Mengye Ren,1,3 Gamaleldin El Sayed,1 Katherine Hermann,1 David Mayo,1,4 Michael C. Mozer1 1Brain Team, Google Research 2University of Colorado 3NYU 4MIT
Pseudocode	No	The paper provides mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	We have implemented the variational EKF optimizer in JAX in a format compatible with Optax. The paper mentions implementation but does not state that the code is publicly available or provide a link.
Open Datasets	Yes	Finally, we tested our methods on classifying a stream of handwritten MNIST digits (Le Cun et al., 2010).
Dataset Splits	No	The paper describes online learning scenarios and mentions using a "random subset of the MNIST training set" for experiments, and a "batch learning method that uses a ﬁxed memory horizon H". However, it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) in a conventional manner for reproduction.
Hardware Specification	No	The paper does not specify any hardware details such as GPU/CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies	No	We have implemented the variational EKF optimizer in JAX in a format compatible with Optax. The paper mentions JAX and Optax but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Latent parameters for the synthetic tasks in Sections 5.1 and 5.2 were sampled using the generative model in Appendix B. That is, the data-generating process matched the generative assumptions of the Bayesian model in both of these cases. We used 20 timescales, geometrically spaced from τ1 = 1 to τ20 = 1000, as illustrated in Figure 6A. Each component OU process was run for 10τi burn-in steps to ensure stationarity. The regression task was run for 10k trials, and the linear classiﬁcation task for 1000 trials. For the MNIST classiﬁcation task in Section 5.3, we used a convolutional neural network (CNN) with two convolution layers followed by two dense layers, with 824458 parameters. Hyperparameters for both methods (noise variance for EKF, learning and momentum rates for SGD) were optimized separately for the two environments.