Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
Authors: SungYub Kim, Sihwan Park, Kyung-Su Kim, Eunho Yang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS Here we describe experiments demonstrating (i) the effectiveness of Connectivity Sharpness (CS) as a generalization measurement metric and (ii) the usefulness of Connectivity Laplace (CL) as a general-purpose Bayesian NN: With CS and CL, we can resolve the contradiction in the FM hypothesis concerning the generalization of NNs and attain stable calibration performance for various ranges of prior scales. |
| Researcher Affiliation | Collaboration | Sung-Yub Kim1, Sihwan Park1, Kyungsu Kim3,4,5 , Eunho Yang1,2 Korea Advanced Institute of Science and Technology (KAIST)1, AITRICS2, Samsung Medical AI Research Center3, Sungkyunkwan University School of Medicine4, Massachusetts General Hospital and Harvard Medical School5 |
| Pseudocode | Yes | In Algorithm 1, we provide a pseudo-code for the RTO implementation of CL. Note that both time and memory complexity of computing linearized NN for mini-batch B is comparable to a forward propagation as shown in Novak et al. (2022) with jax.jvp function in JAX (Bradbury et al., 2018). In Algorithm 2, we provide a pseudo-code for the implementation. |
| Open Source Code | Yes | 1https://github.com/sungyubkim/connectivity-tangent-kernel |
| Open Datasets | Yes | We use CIFAR-10 and 100 datasets (Krizhevsky, 2009), where the 50K training instances are randomly partitioned into SP of cardinality 45K and SQ of cardinality 5K. [...] UCI regression datasets (Hernández-Lobato & Adams, 2015) and its GAP-variants (Foong et al., 2019) |
| Dataset Splits | Yes | We use CIFAR-10 and 100 datasets (Krizhevsky, 2009), where the 50K training instances are randomly partitioned into SP of cardinality 45K and SQ of cardinality 5K. |
| Hardware Specification | Yes | For every experiment, we use 8 NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software like 'TensorFlow, Pytorch, and JAX' and specific functions like 'jax.jvp', but it does not provide specific version numbers for any of these software components, which is necessary for reproducible dependency information. |
| Experiment Setup | Yes | We pre-train Res Net-18 (He et al., 2016) with a mini-batch size of 1K on SP with SGD of initial learning rate 0.4 and momentum 0.9. We use cosine annealing for learning rate scheduling (Loshchilov & Hutter, 2016) with a warmup for the initial 10% training step. We fix δ = 0.1, α = 0.1, and σ = 1.0 to compute equation 8. Table 6: Configuration of hyper-parameter includes network depth (1,2,3), network width (32,64,128), learning rate (0.1, 0.032, 0.001), WD (0.0, 1e-4, 5e-4), mini-batch size (256, 1024, 4096). We use an SGD optimizer with momentum of 0.9. We train each model for 200 epochs and use cosine learning rate scheduler (Loshchilov & Hutter, 2016) with 30% of initial epochs as warm-up epochs. |