Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Adversarial Monte Carlo Meta-Learning of Optimal Prediction Procedures
Authors: Alex Luedtke, Incheoul Chung, Oleg Sofrygin
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we apply our algorithm in two settings and learn estimators that outperform standard approaches in numerical experiments. In Section 6, we also evaluate the performance of these learned estimators in data experiments. |
| Researcher Affiliation | Collaboration | Alex Luedtke EMAIL Incheoul Chung EMAIL Department of Statistics University of Washington Seattle, WA 98195-4322, USA Oleg Sofrygin EMAIL Division of Research Kaiser Permanente Northern California Oakland, CA 94612-2304, USA |
| Pseudocode | Yes | Algorithm 1 Adversarially learn an estimator. 1: Initialize estimator Tt, generator Gg, step sizes η1, η2. 2: for K iterations do 3: for j = 1, 2 do 4: Independently draw U νu and V0, . . . , Vp iid νv. 5: Let P = Gg(U). ... Algorithm 2 Use data d to obtain prediction at x0. 1: Preprocess: Let x0 0 := x0 x / s(x) and define d0 Rn p 2 so that d0 i 1 = xi x / s(x) for all i = 1, . . . , n and d0 j2 = y y / s(y) for all j = 1, . . . , p. 2: Module 1: d1 := m1(d0). d1 Rn p o1 |
| Open Source Code | Yes | All experiments were run in Pytorch 1.0.1 on Tesla V100 GPUs using Amazon Web Services. The code used to conduct the experiments can be found at https://github.com/ alexluedtke12/amc-meta-learning-of-optimal-prediction-procedures. |
| Open Datasets | Yes | Our experiments make use of ten datasets. Six of these datasets are available through the University of California, Irvine (UCI) Machine Learning Repository (Dua and Graff, 2017), three were used to illustrate supervised learning machines in popular statistical learning textbooks (Friedman et al., 2001; James et al., 2013), and one was used as an illustrative example in the paper that introduced FLAM (Petersen et al., 2016). |
| Dataset Splits | Yes | The first includes only the AMC Linear and AMC FLAM estimators as base learners. The second only includes the OLS, lasso, and FLAM estimators. The third includes all five of these estimators. Predictions of the base learners were combined using 10-fold cross-validation. ... We evaluated the performance of AMC Linear and AMC FLAM in the 5 datasets that have 10 or more features by randomly selecting 100 observations and 10 features from each dataset and evaluating MSE on the held out observations. This and all other Monte Carlo evaluations of MSE described in what follows were repeated 200 times and averaged across the replications. |
| Hardware Specification | Yes | All experiments were run in Pytorch 1.0.1 on Tesla V100 GPUs using Amazon Web Services. |
| Software Dependencies | Yes | All gradients in the algorithm can be computed via backpropagation using standard software in our experiments, we used Pytorch for this purpose (Paszke et al., 2019). ... All experiments were run in Pytorch 1.0.1 on Tesla V100 GPUs using Amazon Web Services. ... we compared AMC s performance to ordinary least squares (OLS) and lasso (Tibshirani, 1996) with tuning parameter selected by 10-fold cross-validation, as implemented in scikit-learn (Pedregosa et al., 2011). |
| Experiment Setup | Yes | In each example, the collection of estimators T is parameterized as the network architecture introduced in Section 4.2 with o1 = o2 = 50, o3 = 10, h1 = h3 = 10, h2 = h4 = 3, and, for k = 1, 2, 3, 4, wk = 100. For each module, we use the leaky Re LU activation q(z) := max{z, 0} + 0.01 min{z, 0}. ... In all settings, we set (β2, ϵ) = (0.999, 10 8). Whenever we were updating the prior network, we set the momentum parameter β1 to 0, and whenever we were updating the estimator network, we set the momentum parameter to 0.25. The parameter α differed across settings. In the sparse linear regression setting with s = 1, we found that choosing α small helped to improve stability. Specifically, we let α = 0.0002 when updating both the estimator and prior networks. In the sparse linear regression setting with s = 5, we used the more commonly chosen parameter setting of α = 0.001 for both networks. In the FLAM example, we chose α = 0.001 and α = 0.005 for the estimator and prior networks, respectively. The learning rates were of the estimator and prior networks were decayed at rates t 0.15 and t 0.25, respectively. ... In all settings, the prior and estimator were updated over 106 iterations using batches of 100 datasets. For each dataset, performance is evaluated at 100 values of x0. |