Hamiltonian Monte Carlo on ReLU Neural Networks is Inefficient
Authors: Vu C. Dinh, Lam S. Ho, Cuong V. Nguyen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments with both synthetic and real datasets, we validate our theoretical analyses and highlight the inefficiency of Re LU neural networks compared to analytical networks. |
| Researcher Affiliation | Academia | Vu C. Dinh Department of Mathematical Sciences University of Delaware vucdinh@udel.edu Lam Si Tung Ho Department of Mathematics and Statistics Dalhousie University lam.ho@dal.ca Cuong V. Nguyen Department of Mathematical Sciences Durham University viet.c.nguyen@durham.ac.uk |
| Pseudocode | No | The paper describes the leapfrog updates in Equations (1)-(3) but does not present them in a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We would like to keep the code confidential. |
| Open Datasets | Yes | In addition to the synthetic dataset above, we also conduct experiments to validate our theoretical findings on a subset of the real-world UTKFace dataset (Zhang et al., 2017). |
| Dataset Splits | No | The subset is then randomly split into a training set (167 images) and a test set (100 images). The paper does not explicitly mention a validation dataset split. |
| Hardware Specification | No | The experiments were run on a single CPU machine. The paper does not provide specific details such as CPU model, memory, or other hardware specifications. |
| Software Dependencies | No | Our simulations are implemented using the Autograd package (Maclaurin et al., 2015). The paper mentions the software 'Autograd' but does not specify its version number. |
| Experiment Setup | Yes | We choose a standard normal prior π(q) = N(0, I) and sample 2,000 parameter vectors from the posterior after a burn-in period of 100 samples. We vary the number of steps L {200, 400, 600, 800, 1000} together with the step size ϵ {0.0005, 0.0010, 0.0015, 0.0020, 0.0025}. We fix the travel time T = ϵL = 0.1 and vary ϵ {0.0005, 0.0010, 0.0015, . . . , 0.0040}. For each network, we run HMC with L = 200 and ϵ = 0.001 while keeping other hyper-parameters the same as in previous simulations. In this experiment, we fix T = 0.01 and vary ϵ {0.00005, 0.00010, 0.00015, . . . , 0.00040}. For each run of the HMC, we sample 300 parameter vectors from the posterior after a burn-in period of 50 samples. |