reproducibilityindex.ai

M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

Authors: Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical observations on several classic tasks like LASSO, Quadratic and Rosenbrock demonstrate that M-L2O converges significantly faster than vanilla L2O with only 5 steps of adaptation, echoing our theoretical results. Codes are available in https://github.com/VITA-Group/M-L2O." and "5 EXPERIMENTS In this section, we provide a comprehensive description of the experimental settings and present the results we obtained. Our findings demonstrate a high degree of consistency between the empirical observations and the theoretical outcomes.
Researcher Affiliation	Academia	Junjie Yang1, Xuxi Chen2, Tianlong Chen2, Zhangyang Wang2, Yingbin Liang1 1The Ohio State University, 2University of Texas at Austin
Pseudocode	Yes	Algorithm 1 Our Proposed M-L2O.
Open Source Code	Yes	Codes are available in https://github.com/VITA-Group/M-L2O.
Open Datasets	Yes	Optimizees. We conduct experiments on three distinct optimizees, namely LASSO, Quadratic, and Rosenbrock (Rosenbrock, 1960). The formulation of the Quadratic problem is minx 1 2 Ax b 2 and the formulation of the LASSO problem is minx 1 2 Ax b 2 + λ x 1, where A Rd d, b Rd. We set λ = 0.005. The precise formulation of the Rosenbrock problem is available in Section A.6. During the meta-training and testing stage, the optimizees ξtrain and ξtest are drawn from the pre-specified distributions Dtrain and Dtest, respectively. Similarly, the optimizees ξadapt used during adaptation are sampled from the distribution Dadapt.
Dataset Splits	No	The paper mentions 'training', 'adaptation', and 'testing' optimizees/tasks but does not explicitly provide details on a separate validation dataset split.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions using a 'single-layer LSTM network' and 'Adam' optimizer but does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For all our experiments, we use a single-layer LSTM network with 20 hidden units as the backbone. We adopt the methodology proposed by Lv et al. (2017) and Chen et al. (2020a) to utilize the parameters gradients and their corresponding normalized momentum to construct the observation vectors. [...] For all experiments, we set the number of optimizee iterations, denoted by T, to 20 when meta-training the L2O optimizers and adapting to optimizees. [...] The value of the total epochs, denoted by K, is set to 5000, and we adopt the curriculum learning technique (Chen et al., 2020a) to dynamically adjust the number of epochs per task, denoted by S. To update the weights of the optimizers (ϕ), we use Adam (Kingma & Ba, 2014) with a fixed learning rate of 1 × 10^−4.