What does automatic differentiation compute for neural networks?

Authors: Sejun Park, Sanghyuk Chun, Wonyeol Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically check our sufficient conditions over popular network architectures and observe that AD almost always computes a Clarke subderivative in practical learning setups.
Researcher Affiliation Collaboration 1Korea University 2NAVER AI Lab 3Carnegie Mellon University
Pseudocode Yes Algorithm 1 Construction of Pl
Open Source Code No The paper does not provide any statement about making its source code available or include links to a code repository.
Open Datasets Yes We trained these networks on the MNIST dataset (Le Cun et al., 2010) using stochastic gradient descent (SGD)... trained these networks on the CIFAR-10 dataset (Krizhevsky et al., 2009)
Dataset Splits No The paper mentions training on MNIST and CIFAR-10 and using a minibatch size, but it does not provide explicit details about train/validation/test splits (e.g., percentages or counts).
Hardware Specification No The paper describes the training process and software used (PyTorch) but does not specify any hardware details like GPU models, CPU types, or memory configurations.
Software Dependencies No The paper mentions machine learning frameworks like PyTorch: "computed via (reverse-mode) AD implemented in Py Torch." However, it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes All networks were trained for 20 epochs with the initial learning rate 0.05 and the weight decay 0.0001, where the learning rate was decayed by the cosine annealing scheduling (Loshchilov and Hutter, 2017).