DDPNOpt: Differential Dynamic Programming Neural Optimizer
Authors: Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It outperforms other optimal-control inspired training methods in both convergence and complexity, and is competitive against state-of-the-art first and second order methods. We show that DDPNOpt achieves competitive performance against existing training methods on classification datasets and outperforms previous OCP-inspired methods in both training performance and runtime complexity. |
| Researcher Affiliation | Academia | Georgia Institute of Technology, USA |
| Pseudocode | Yes | Algorithm 1 Differential Dynamic Programming |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the public availability of the DDPNOpt source code. |
| Open Datasets | Yes | We first validate the performance of training fully-connected (FCN) and convolution networks (CNN) using DDPNOpt on classification datasets. FCN consists of 5 fully-connected layers with the hidden dimension ranging from 10 to 32, depending on the size of the dataset. CNN consists of 4 convolution layers (with 3 3 kernel, 32 channels), followed by 2 fully-connected layers. We use Re LU activation on all datasets except Tanh for WINE and DIGITS to better distinguish the differences between optimizers. |
| Dataset Splits | No | The paper does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts) or refer to standard predefined splits. |
| Hardware Specification | Yes | Regarding the machine information, we conduct our experiments on GTX 1080 TI, RTX TITAN, and four Tesla V100 SXM2 16GB. |
| Software Dependencies | No | The paper mentions using 'Py Torch' but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | The batch size is set to 8-32 for datasets trained with FCN, and 128 for datasets trained with CNN. As DDPNOpt combines strengths from both standard training methods and OCP framework, we select baselines from both sides. This includes first-order methods, i.e. SGD (with tuned momentum), RMSprop, Adam, and second-order method EKFAC (George et al., 2018)... For each baseline we select its own hyperparameter from an appropriate search space, which we detail in Table 5. |