Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
Authors: Xuanyuan Luo, Bei Luo, Jian Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments for FGD and FSGD on MNIST [Le Cun et al., 1998] and CIFAR10 [Krizhevsky et al., 2009] to investigate the the optimization and generalization properties of FGD and FSGD, and the numerical closeness between our theoretical bounds and true test errors. |
| Researcher Affiliation | Academia | Xuanyuan Luo IIIS, Tsinghua University xuanyuanluo@google.com Luo Bei Renmin University of China rabbit_lb@ruc.edu.cn Jian Li IIIS, Tsinghua University lijian83@mail.tsinghua.edu |
| Pseudocode | Yes | Algorithm 1: Floored Gradient Descent (FGD) Input: Training dataset S = (z1, .., zn). Index set J. Result: Parameter WT 2 Rd. 1 Initialize W0 w0; 2 for t : 1 ! T do 3 g1 γtrf(Wt 1, S); 4 g2 γtrf(Wt 1, SJ); 5 Wt Wt 1 g2 "t floor((g1 g2)/"t); |
| Open Source Code | No | The paper mentions providing a code in supplementary material in the checklist, but the actual PDF provided does not contain a link or specific instruction to access it. The checklist states: "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]" However, the main paper PDF does not provide this URL or clear access information within its content. |
| Open Datasets | Yes | In this section, we conduct experiments for FGD and FSGD on MNIST [Le Cun et al., 1998] and CIFAR10 [Krizhevsky et al., 2009] to investigate the the optimization and generalization properties of FGD and FSGD, and the numerical closeness between our theoretical bounds and true test errors. |
| Dataset Splits | No | The paper does not explicitly provide percentages or counts for training, validation, and test splits. It uses 'standard datasets' where such splits are common knowledge but does not detail them within the paper itself for reproducibility of the specific splits used. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for experiments in the main text. The checklist states: "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] It can be found in our supplemental material." However, this information is not in the provided paper PDF. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for reproducibility. It states: "The code and the data are proprietary" in the checklist, and mentions some related software/libraries in the text (e.g., PyTorch, TensorFlow) but without version details for its own implementation. |
| Experiment Setup | Yes | For MNIST, we train a CNN (d = 1.4 106) by FGD with γt = 0.005 0.9b t 150 c and "t = 0.005 and momentum = 0.9). The size m = |J| is set to n/2 = 30000. ... For CIFAR10, we train a Simple Net [Hasanpour et al., 2016] without Batch Norm and Dropout. The number of parameters d is nearly 18 106. We use FSGD to train our model. The learning rate γt is set to 0.001 0.9bt/200c, the precision "t is set to 0.004, and the momentum is set to 0.99. The batch size is 2000. m = |J| is set to n/5 = 10000. |