Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset

Authors: Alexandre Galashov, Michalis Titsias, András György, Clare Lyle, Razvan Pascanu, Yee Whye Teh, Maneesh Sahani

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that our approach performs well in non-stationary supervised, and off-policy reinforcement learning settings.
Researcher Affiliation Collaboration Alexandre Galashov Gatsby Unit, UCL Google Deep Mind agalashov@google.com Michalis K. Titsias Google Deep Mind mtitsias@google.com András György Google Deep Mind agyorgy@google.com Clare Lyle Google Deep Mind clarelyle@google.com Razvan Pascanu Google Deep Mind razp@google.com Yee Whye Teh Google Deep Mind University of Oxford ywteh@google.com Maneesh Sahani Gatsby Unit, UCL maneesh@gatsby.ucl.ac.uk
Pseudocode Yes Algorithm 1 Soft-Reset algoritm
Open Source Code No Unfortunately, due to IP constrains, we cannot release the code for the paper.
Open Datasets Yes subset of 10000 images images from either CIFAR-10 [32] or MNIST and Hopper-v5 and Humanoid-v4 GYM [6] environments
Dataset Splits No The paper defines metrics like 'average per-task online accuracy' (Section 5, H.1) which evaluates performance during training. It describes training regimes (e.g., '400 epochs on a task with a batch size of 128') but does not specify a separate validation dataset split (e.g., '10% of data used for validation').
Hardware Specification Yes For each experiment, we used a 3 hours of the A100 GPU with 40 Gb of memory.
Software Dependencies No We ran SAC [19] agent with default parameters from Brax [15] on the Hopper-v5 and Humanoid-v4 GYM [6] environments. No specific version numbers for Brax, GYM, SAC, Python, or other libraries are given.
Experiment Setup Yes For all the experiments, we run a sweep over the hyperparameters. We select the best hyperparameters based on the smallest cumulative error (sum of all 1 at i throughout the training). We then report the mean and the standard deviation across 3 seeds in all the plots. Hyperparameter ranges . Learning rate α which is used to update parameters, for all the methods, is selected from {1e 4, 5e 4, 1e 3, 5e 3, 1e 2, 5e 2, 1e 1, 5e 1, 1.0}. The λinit parameter in L2 Init, is selected from {10.0, 1.0, 0.0, 1e 1, ...}. For S&P, the shrink parameter λ is selected from {1.0, 0.99999, ...}, and the perturbation parameter σ is from {1e 1, ...}. For Soft Resets, the learning rate for γt is selected from {0.5, 0.1, ...}, the constant s is selected from {1.0, 0.95, ...}, the temperature λ in (45) is selected from {1.0, 0.1, 0.01}...