Transferring Optimality Across Data Distributions via Homotopy Methods

Authors: Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on a toy regression dataset and for transferring optimized parameters from MNIST to Fashion-MNIST and CIFAR-10 show substantial improvement of the numerical performance over random initialization and pre-training.
Researcher Affiliation Collaboration Matilde Gargiani1, Andrea Zanelli2, Quoc Tran-Dinh3, Moritz Diehl2,4, Frank Hutter1,5 1Department of Computer Science, University of Freiburg {gargiani, fh}@cs.uni-freiburg.de 2Department of Microsystems Engineering (IMTEK), University of Freiburg {andrea.zanelli, moritz.diehl}@imtek.uni-freiburg.de 3Department of Statistics and Operations Research, University of North Carolina quoctd@email.unc.edu 4Department of Mathematics, University of Freiburg 5Bosch Center for Artificial Intelligence
Pseudocode Yes Conceptually, Algorithm 1 describes the basic steps of a general homotopy algorithm. Algorithm 1 A Conceptual Homotopy Algorithm 1: θ0 θ 0 arg minθ H(θ, 0) 2: γ > 0 , γ Z 3: λ0 = 0, λ = 1/γ 4: k > 0 , k Z 5: for i = 1, . . . , γ do 6: λi λi 1 + λ 7: procedure θi ITERATIVESOLVER(θi 1, k, H(θ, λi)) 8: return θγ
Open Source Code No The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes Empirical evaluations on a toy regression dataset and for transferring optimized parameters from MNIST to Fashion-MNIST and CIFAR-10 show substantial improvement of the numerical performance over random initialization and pre-training.
Dataset Splits No Section 6.1 states: 'Each considered dataset has 10000 samples split across training and testing...' While a train/test split is mentioned, no specific information about a validation split, exact percentages for splits, or cross-validation methodology is provided.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance specifications.
Software Dependencies No The paper mentions using 'Adam as optimizer' and 'VGG-type network'. However, it does not specify any version numbers for these or other software components (e.g., Adam version, PyTorch version, Python version), which are necessary for reproducibility.
Experiment Setup Yes For the experiments in Figures 1a 1b, Figures 7a 7b in the appendix, and Figure 2a, we set α = 0.001, γ = 10, k = 200 and then performed an additional 500 epochs on the final target problem, while for the experiments in Figure 2b, we set γ = 10, k = 300 and performed an additional 600 epochs on the final target problem. In this last scenario we set α = 0.001 and then decrease it with a cosine annealing schedule to observe convergence to an optimum.