Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies
Authors: Oscar Li, James Harrison, Jascha Sohl-Dickstein, Virginia Smith, Luke Metz
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally3, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning. |
| Researcher Affiliation | Collaboration | Oscar Li 1, James Harrison , Jascha Sohl-Dickstein , Virginia Smith , Luke Metz 2 Machine Learning Department, School of Computer Science Carnegie Mellon University Google Deep Mind 1Correspondence to: oscarli@cmu.edu. 2Now at Open AI. |
| Pseudocode | Yes | Algorithm 1 Persistent Evolution Strategies [15] class PESWorker(Online ESWorker): |
| Open Source Code | Yes | Code available at https://github.com/Oscarcar Li/Noise-Reuse-Evolution-Strategies. |
| Open Datasets | Yes | We consider meta-training the learned optimizer model given in [3] (d = 1762) to optimize a 3-layer MLP on the Fashion MNIST dataset for T = 1000 steps. |
| Dataset Splits | No | The paper mentions using a 'sampled validation batch' but does not provide specific details on how this validation set is created or its size in relation to overall dataset splits, which is necessary for reproducibility. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) were provided for running experiments, only vague mentions like 'same hardware'. |
| Software Dependencies | No | The paper mentions software like Tensorflow [34], Pytorch [35], and JAX [49] but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | Hence, we take extra care in tuning each method s constant learning rate and additionally allow PES to have a decay schedule. We plot the convergence of different ES gradient estimators in wall-clock time using the same hardware in Figure 5(b). (We additionally compare against automatic differentiation methods in Figure 9 in the Appendix; they all perform worse than the ES methods shown here.) |