Pay attention to your loss : understanding misconceptions about Lipschitz neural networks
Authors: Louis Béthune, Thibaut Boissin, Mathieu Serrurier, Franck Mamalet, Corentin Friedrich, Alberto Gonzalez Sanz
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate this empirically by training a Lip Net1 network until it reaches 99.96% accuracy on CIFAR-100 with random labels (see Appendix I). |
| Researcher Affiliation | Collaboration | Louis Béthune, IRIT, Université Paul-Sabatier Toulouse, France Thibaut Boissin, IRT Saint-Exupéry Toulouse, France Mathieu Serrurier IRIT, Université Paul-Sabatier Toulouse, France Franck Mamalet IRT Saint-Exupéry Toulouse, France Corentin Friedrich IRT Saint-Exupéry Toulouse, France Alberto González-Sanz IMT, Université Paul-Sabatier Toulouse, France |
| Pseudocode | No | The paper describes methods and processes in narrative text and via mathematical propositions but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | 1https://github.com/deel-ai/deel-lip distributed under MIT License (MIT). |
| Open Datasets | Yes | We illustrate this empirically by training a Lip Net1 network until it reaches 99.96% accuracy on CIFAR-100 with random labels (see Appendix I). Such as CIFAR10 or MNIST. |
| Dataset Splits | Yes | This can be observed in practice : when temperature τ (resp. margin m) of cross-entropy (resp. hinge loss) is correctly adjusted a small Lip Net1 CNN can reach a competitive 88.2% validation accuracy on the CIFAR-10 dataset (results synthetized and discussed in Figure 3) without residual connections, batch normalization or dropout. |
| Hardware Specification | No | Appendix K mentions: "Training was done on a single GPU with a batch size of 128." This is a vague description of hardware and does not provide specific details like the GPU model or manufacturer. |
| Software Dependencies | No | The paper mentions software frameworks like TensorFlow or PyTorch, and the Deel.Lip1 library, but it does not specify any version numbers for these or other software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | Appendix K titled "Experimental setup" provides specific hyperparameters and training details: "Training was done on a single GPU with a batch size of 128. The learning rate was set to 1e-3 and reduced by 0.1 every 30 epochs for a total of 100 epochs. Optimizer used was ADAM. The results are averaged over 3 seeds. The values for the loss parameters τ, m, α were chosen among: {0.01, 0.05, 0.1, 0.5, 1., 2., 5., 10., 20., 50., 100.}" |