An Alternating Optimization Method for Bilevel Problems under the Polyak-Łojasiewicz Condition

Authors: Quan Xiao, Songtao Lu, Tianyi Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our stationary measure is a necessary condition of the global optimality. As shown in Figure 1, GALET approaches the global optimal set of Example 1 and our stationary measure also converges to 0, while the value-function based KKT score does not. ... We compare GALET with BOME [37], IAPTT-GM [43] and V-PBGD [57] in the hyper-cleaning task on the MNIST dataset. As shown in Figure 5, GALET converges faster than other methods and the convergence rate of GALET is O(1/K), which matches Theorem 2. Table 2 shows that the test accuracy of GALET is comparable to other methods.
Researcher Affiliation Collaboration Quan Xiao Rensselaer Polytechnic Institute Troy, NY, USA xiaoq5@rpi.edu Songtao Lu IBM Research Yorktown Heights, NY, USA songtao@ibm.com Tianyi Chen Rensselaer Polytechnic Institute Troy, NY, USA chentianyi19@gmail.com
Pseudocode Yes Algorithm 1 GALET for nonconvex-PL BLO
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets Yes We compare GALET with BOME [37], IAPTT-GM [43] and V-PBGD [57] in the hyper-cleaning task on the MNIST dataset. ... We compare our method with the existing methods on the data hyper-cleaning task using the MNIST and the Fashion MNIST dataset [21].
Dataset Splits Yes We are given 5000 training data with corruption rate 0.5, 5000 clean validation data and 10000 clean testing data.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments.
Software Dependencies No The paper mentions computing the Hessian-vector product via an efficient method [53] and using 'auto differentiate', but does not specify any software names with version numbers.
Experiment Setup Yes Parameter choices. The dimension of the hidden layer of MLP model is set as 50. We select the stepsize from α {1, 10, 50, 100, 200, 500}, γ {0.1, 0.3, 0.5, 0.8} and β {0.001, 0.005, 0.01, 0.05, 0.1}, while the number of loops is chosen from T {5, 10, 20, 30, 50} and N {5, 10, 30, 50, 80}. The default choice of parameter is α = 0.3, K = 30, β = 1, N = 1, γ = 0.1, T = 1.