Adaptive Sampling for Stochastic Risk-Averse Learning
Authors: Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate its effectiveness on large-scale convex and non-convex learning tasks. |
| Researcher Affiliation | Academia | Sebastian Curi Dept. of Computer Science ETH Zurich scuri@inf.ethz.ch Kfir Y. Levy Faculty of Electrical Engineering Technion kfirylevy@technion.ac.il Stefanie Jegelka CSAIL MIT stefje@mit.edu Andreas Krause Dept. of Computer Science ETH Zurich krausea@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1: ADA-CVAR |
| Open Source Code | Yes | We provide an open-source implementation of our method, which is available at http: //github.com/sebascuri/adacvar. |
| Open Datasets | Yes | We consider three UCI regression data sets, three synthetic regression data sets, and eight different UCI classification data sets (Dua and Graff, 2017). ... on common non-convex optimization benchmarks in deep learning (MNIST, Fashion-MNIST, CIFAR-10). |
| Dataset Splits | No | The paper discusses training and test sets but does not explicitly mention or detail a separate validation set split or its use for hyperparameter tuning. While it refers to standard benchmark datasets, it does not specify the exact train/validation/test percentages or sample counts used in their experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, memory, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions deep learning frameworks implicitly (e.g., citing a PyTorch workshop paper), but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | With the same learning rates, these algorithms usually produce numerical overflows and, to stabilize learning, we used considerably smaller learning rates. In turn, this increased the number of iterations required for convergence. ADA-CVAR does not suffer from this as the gradients have the same magnitude as in MEAN. For example, to reach 85 % train accuracy ADA-CVAR requires 7 epochs, MEAN 9, SOFT-CVAR 21, and TRUNC-CVAR never surpassed 70 % train accuracy. |