Neural Characteristic Activation Analysis and Geometric Parameterization for ReLU Networks
Authors: Wenlin Chen, Hong Ge
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section contains empirical evaluation of Gm P with neural network architectures of different sizes on both illustrative demonstrations and more challenging machine learning classification and regression benchmarks. |
| Researcher Affiliation | Academia | Wenlin Chen University of Cambridge MPI for Intelligent Systems wc337@cam.ac.uk Hong Ge University of Cambridge hg344@cam.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Wenlin-Chen/geometric-parameterization. |
| Open Datasets | Yes | We evaluate Gm P on 7 regression problems from the UCI dataset [11]. We evaluate Gm P with a medium-sized convolutional neural network VGG-6 [58] on Image Net32 [8] We evaluate Gm P with a large residual neural network, Res Net-18 [22], on the full Image Net (ILSVRC 2012) dataset [10]. |
| Dataset Splits | Yes | We train an MLP with one hidden layer and 100 hidden units for 10 different random 80/20 train/test splits. Image Net (ILSVRC 2012) dataset [10], which consists of 1,281,167 training images and 50,000 validation images. |
| Hardware Specification | Yes | All models are trained on a single NVIDIA Ge Force RTX 2080 Ti. All models are trained on a single NVIDIA A100 (80GB). |
| Software Dependencies | No | The paper mentions using optimizers like Adam and SGD and implies a deep learning framework, but it does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | We use the Adam optimizer [28] with full-batch training. We use cross-validation to select the learning rate for each compared method from the set {0.001, 0.003, 0.01, 0.03, 0.1, 0.3}. We find that the optimal initial learning rate is 0.1 for Gm P and 0.01 for all the other compared methods. |