What does automatic differentiation compute for neural networks?
Authors: Sejun Park, Sanghyuk Chun, Wonyeol Lee
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically check our sufficient conditions over popular network architectures and observe that AD almost always computes a Clarke subderivative in practical learning setups. |
| Researcher Affiliation | Collaboration | 1Korea University 2NAVER AI Lab 3Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Construction of Pl |
| Open Source Code | No | The paper does not provide any statement about making its source code available or include links to a code repository. |
| Open Datasets | Yes | We trained these networks on the MNIST dataset (Le Cun et al., 2010) using stochastic gradient descent (SGD)... trained these networks on the CIFAR-10 dataset (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper mentions training on MNIST and CIFAR-10 and using a minibatch size, but it does not provide explicit details about train/validation/test splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper describes the training process and software used (PyTorch) but does not specify any hardware details like GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper mentions machine learning frameworks like PyTorch: "computed via (reverse-mode) AD implemented in Py Torch." However, it does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | All networks were trained for 20 epochs with the initial learning rate 0.05 and the weight decay 0.0001, where the learning rate was decayed by the cosine annealing scheduling (Loshchilov and Hutter, 2017). |