Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies

Authors: Oscar Li, James Harrison, Jascha Sohl-Dickstein, Virginia Smith, Luke Metz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally3, we show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of unroll steps across a variety of applications, including learning dynamical systems, meta-training learned optimizers, and reinforcement learning.
Researcher Affiliation Collaboration Oscar Li 1, James Harrison , Jascha Sohl-Dickstein , Virginia Smith , Luke Metz 2 Machine Learning Department, School of Computer Science Carnegie Mellon University Google Deep Mind 1Correspondence to: oscarli@cmu.edu. 2Now at Open AI.
Pseudocode Yes Algorithm 1 Persistent Evolution Strategies [15] class PESWorker(Online ESWorker):
Open Source Code Yes Code available at https://github.com/Oscarcar Li/Noise-Reuse-Evolution-Strategies.
Open Datasets Yes We consider meta-training the learned optimizer model given in [3] (d = 1762) to optimize a 3-layer MLP on the Fashion MNIST dataset for T = 1000 steps.
Dataset Splits No The paper mentions using a 'sampled validation batch' but does not provide specific details on how this validation set is created or its size in relation to overall dataset splits, which is necessary for reproducibility.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were provided for running experiments, only vague mentions like 'same hardware'.
Software Dependencies No The paper mentions software like Tensorflow [34], Pytorch [35], and JAX [49] but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes Hence, we take extra care in tuning each method s constant learning rate and additionally allow PES to have a decay schedule. We plot the convergence of different ES gradient estimators in wall-clock time using the same hardware in Figure 5(b). (We additionally compare against automatic differentiation methods in Figure 9 in the Appendix; they all perform worse than the ES methods shown here.)