On Penalty-based Bilevel Gradient Descent Method
Authors: Han Shen, Tianyi Chen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results showcase the efficiency of the proposed algorithm. The code is available on Git Hub (link). ... Finally, we empirically showcase the performance, computation and memory efficiency of the proposed algorithm in comparison with several competitive baselines. ... In this section, we test PBGD in the data hyper-cleaning task (Franceschi et al., 2017; Shaban et al., 2019). |
| Researcher Affiliation | Academia | 1Department of ECSE, Rensselaer Polytechnic Institute, Troy, NY, USA. |
| Pseudocode | Yes | Algorithm 1 PBGD: Penalized bilevel GD ... Algorithm 2 V-PBGD: Function value gap based PBGD ... Algorithm 3 V-PBGD under lower-level constraint |
| Open Source Code | Yes | The code is available on Git Hub (link). ... The code is available on github (link). |
| Open Datasets | Yes | Adopting the settings in (Franceschi et al., 2017; Liu et al., 2021b; Shaban et al., 2019), we randomly split the MNIST data-set into a training data-set of size 5000, a validation set of size 5000 and a test set of size 10000; and pollute 50% of the training data with uniformly drawn labels. |
| Dataset Splits | Yes | we randomly split the MNIST data-set into a training data-set of size 5000, a validation set of size 5000 and a test set of size 10000 |
| Hardware Specification | No | The paper mentions "GPU memory" in Table 3 but does not specify any particular GPU model, CPU, or other detailed hardware specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | We then run V-PBGD with γ = 10 for 1000 random initial points (x1, y1) and plot the last iterates in Figure 2a (right). ... we randomly split the MNIST data-set into a training data-set of size 5000, a validation set of size 5000 and a test set of size 10000; and pollute 50% of the training data with uniformly drawn labels. Then we run the algorithms with a linear model and an MLP network. |