Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
Authors: Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Hes Scale s approximation quality and compare it with other methods. We start by studying the approximation quality of Hessian diagonals compared to the true values. In our experiments, we implemented Hes Scale using the Back PACK framework (Dangel et al. 2020b)... Our task here is supervised classification, and data examples are sampled randomly from MNIST. We used a network of three hidden layers with tanh activations, each containing 32 units. We train each method for 200 epochs with a batch size of 128. In the second experiment, we use the CIFAR-100 ALLCNN task... We investigate the performance of Ada Hes Scale against other optimizers when used with two reinforcement learning algorithms, A2C (Mnih et al. 2016) and PPO (Schulman et al. 2017), on the Mu Jo Co environments (Todorov et al. 2012). |
| Researcher Affiliation | Academia | 1Department of Computing Science, University of Alberta, Edmonton, Canada 2Alberta Machine Intelligence Institute 3Vector Institute, Toronto, Canada 4CIFAR AI Chair. |
| Pseudocode | Yes | Algorithm 1 Hes Scale Algorithm 2 Ada Hes Scale Algorithm 3 Ada Hes Scale with step-size scaling |
| Open Source Code | Yes | Code is available at: https://github.com/mohmdelsayed/Hes Scale |
| Open Datasets | Yes | Our task here is supervised classification, and data examples are sampled randomly from MNIST. ... We trained for 10000 iterations on EMNIST (Cohen et al. 2017) with a batch size of 32. ... In the first experiment, we used the CIFAR-100 3C-3D task from Deep OBS. ... In the second experiment, we use the CIFAR-100 ALLCNN task from Deep OBS... We investigate the performance of Ada Hes Scale against other optimizers when used with two reinforcement learning algorithms, A2C (Mnih et al. 2016) and PPO (Schulman et al. 2017), on the Mu Jo Co environments (Todorov et al. 2012). |
| Dataset Splits | Yes | We performed a hyperparameter search for each method to find the best step size. Using each method s best step size on the validation set, we show the performance of the method against the time in seconds needed to complete the required number of epochs, which better depicts the computational efficiency of the methods. |
| Hardware Specification | No | The paper mentions receiving 'computational resources' from 'Digital Research Alliance of Canada' and 'Vector Institute' in the Acknowledgement section but does not provide specific hardware details such as GPU/CPU models or memory specifications used for the experiments. |
| Software Dependencies | No | In our experiments, we implemented Hes Scale using the Back PACK framework (Dangel et al. 2020b). No specific version number for Back PACK or other software dependencies like Python, PyTorch, or TensorFlow are provided. |
| Experiment Setup | Yes | We trained the network with SGD using a batch size of 1. ... We train each method for 200 epochs with a batch size of 128. ... We train each method for 350 epochs with a batch size of 256. ... In both experiments, we used β1 = 0.9 and β2 = 0.999 for all adaptive methods. ... We performed a hyperparameter search for each method to find the best step size. ... We used a trust-region radius = 10 8 and applied the step-size scaling mechanism on both the actor and the critic networks. |