Learning in temporally structured environments
Authors: Matt Jones, Tyler R. Scott, Mengye Ren, Gamaleldin Fathy Elsayed, Katherine Hermann, David Mayo, Michael Curtis Mozer
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Third, we evaluate the ability of these methods to handle nonstationarity by testing them in online prediction tasks characterized by 1/f noise in the latent parameters. We find that the Bayesian model significantly outperforms online stochastic gradient descent and two batch heuristics that rely preferentially or exclusively on more recent data. Moreover, the variational approximation performs nearly as well as the full Bayesian model, and with memory requirements that are linear in the size of the network. |
| Researcher Affiliation | Collaboration | Matt Jones,1,2 Tyler R. Scott,1 Mengye Ren,1,3 Gamaleldin El Sayed,1 Katherine Hermann,1 David Mayo,1,4 Michael C. Mozer1 1Brain Team, Google Research 2University of Colorado 3NYU 4MIT |
| Pseudocode | No | The paper provides mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | We have implemented the variational EKF optimizer in JAX in a format compatible with Optax. The paper mentions implementation but does not state that the code is publicly available or provide a link. |
| Open Datasets | Yes | Finally, we tested our methods on classifying a stream of handwritten MNIST digits (Le Cun et al., 2010). |
| Dataset Splits | No | The paper describes online learning scenarios and mentions using a "random subset of the MNIST training set" for experiments, and a "batch learning method that uses a fixed memory horizon H". However, it does not provide specific train/validation/test dataset splits (e.g., percentages or counts) in a conventional manner for reproduction. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or cloud computing instances used for the experiments. |
| Software Dependencies | No | We have implemented the variational EKF optimizer in JAX in a format compatible with Optax. The paper mentions JAX and Optax but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Latent parameters for the synthetic tasks in Sections 5.1 and 5.2 were sampled using the generative model in Appendix B. That is, the data-generating process matched the generative assumptions of the Bayesian model in both of these cases. We used 20 timescales, geometrically spaced from τ1 = 1 to τ20 = 1000, as illustrated in Figure 6A. Each component OU process was run for 10τi burn-in steps to ensure stationarity. The regression task was run for 10k trials, and the linear classification task for 1000 trials. For the MNIST classification task in Section 5.3, we used a convolutional neural network (CNN) with two convolution layers followed by two dense layers, with 824458 parameters. Hyperparameters for both methods (noise variance for EKF, learning and momentum rates for SGD) were optimized separately for the two environments. |