Structural Credit Assignment in Neural Networks using Reinforcement Learning
Authors: Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip S. Thomas, Martha White
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate Co ANs on problems where backprop is known to perform well, to provide a strong baseline and facilitate understanding the behavior, and potential issues, when learning in Co ANs. ... To investigate this question we use two well-studied datasets: MNIST [27] for classifying handwritten digits, and the Boston Housing Dataset from UCI. ... Results are averaged over 10 independent runs and compared using the area under curve (AUC). |
| Researcher Affiliation | Academia | Dhawal Gupta, Gabor Mihucz, Matthew K. Schlegel Department of Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta {dhawal,mihucz,mkschleg}@ualberta.ca James E. Kostas, Philip S. Thomas College of Information and Computer Sciences University of Massachusetts {jekostas,pthomas}@cs.umass.edu Martha White Department of Computing Science CIFAR AI Chair, Amii University of Alberta whitem@ualberta.ca |
| Pseudocode | Yes | More details on baselines are in Appendix B and pseudocode in Appendix H. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use two well-studied datasets: MNIST [27] for classifying handwritten digits, and the Boston Housing Dataset from UCI. |
| Dataset Splits | Yes | Hyperparameters are chosen from performance on a validation set held out from the training set: 10K for MNIST and 51 samples for Boston Housing. ... We report the performance of the best performing parameters on a held-out test set with the same size as the validation set in Figure 2 (a). ... and test it every 900th step on a test set of 1,800 iid samples, with the best hyperparameters picked on the validation set of equal size. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general training aspects without hardware specifics. |
| Software Dependencies | No | The paper mentions optimizers like RMSProp and Adam, but it does not specify any programming languages, libraries, or other software components with version numbers required for reproducibility. |
| Experiment Setup | Yes | We test both strategies using a single and double-layer neural network, with 64 hidden nodes and Re LU activations. Each node in the Co AN is a single coagent using a Gaussian distribution with parameterized mean and a fixed standard deviation, set system-wide through a systematic sweep over σ {0.1, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0}. ... Both use RMSProp [57], with fixed β = 0.99 and stepsizes swept for α {2 7, 2 9, 2 11 . . . , 2 15}. We use mini-batch gradient descent with batch size 32 for MNIST with 50 epochs, and full gradient descent for the Boston Housing Dataset with 10k epochs. |