reproducibilityindex.ai

What does automatic differentiation compute for neural networks?

Authors: Sejun Park, Sanghyuk Chun, Wonyeol Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically check our sufficient conditions over popular network architectures and observe that AD almost always computes a Clarke subderivative in practical learning setups.
Researcher Affiliation	Collaboration	1Korea University 2NAVER AI Lab 3Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Construction of Pl
Open Source Code	No	The paper does not provide any statement about making its source code available or include links to a code repository.
Open Datasets	Yes	We trained these networks on the MNIST dataset (Le Cun et al., 2010) using stochastic gradient descent (SGD)... trained these networks on the CIFAR-10 dataset (Krizhevsky et al., 2009)
Dataset Splits	No	The paper mentions training on MNIST and CIFAR-10 and using a minibatch size, but it does not provide explicit details about train/validation/test splits (e.g., percentages or counts).
Hardware Specification	No	The paper describes the training process and software used (PyTorch) but does not specify any hardware details like GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions machine learning frameworks like PyTorch: "computed via (reverse-mode) AD implemented in Py Torch." However, it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	All networks were trained for 20 epochs with the initial learning rate 0.05 and the weight decay 0.0001, where the learning rate was decayed by the cosine annealing scheduling (Loshchilov and Hutter, 2017).