Learning To Stop While Learning To Predict
Authors: Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le Song
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks. |
| Researcher Affiliation | Collaboration | 1Georgia Institute of Technology, USA 2Google Research, USA 3King Abdullah University of Science and Technology, Saudi Arabia 4Ant Financial, China. |
| Pseudocode | Yes | Algorithm 1 Overall Algorithm Randomly initialized θ and φ. For itr = 1 to #iterations do Stage I. Sample a batch of data points B D. Take an optimization step to update θ towards the marginal likelihood function defined in Eq. 9. For itr = 1 to #iterations do Stage II. Sample a batch of data points B D. Take an optimization step to update φ towards the reverse KL divergence defined in Eq. 10. For itr = 1 to #iterations do Optional Step Sample a batch of data points B D. Update both θ and φ towards β-VAE objective in Eq. 6. return θ, φ |
| Open Source Code | Yes | Pytorch implementation of the experiments is released at https://github.com/xinshi-chen/l2stop. |
| Open Datasets | Yes | We use the benchmark datasets Omniglot (Lake et al., 2011) and Mini Imagenet (Ravi & Larochelle, 2017). The models are trained on BSD500 (400 images) (Arbelaez et al., 2010), validated on BSD12, and tested on BSD68 (Martin et al., 2001). |
| Dataset Splits | Yes | Each task is an N-way classification that contains meta-{train, valid, test} sets. The models are trained on BSD500 (400 images) (Arbelaez et al., 2010), validated on BSD12, and tested on BSD68 (Martin et al., 2001). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions 'Pytorch implementation' but does not specify any version numbers for PyTorch or other software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | The signal-to-noise ratio (SNR) for each sample is uniformly sampled from 20, 30, and 40. The training loss for LISTA is PT t=1 γT t xt x 2 2 where γ 1. For ISTA and FISTA, we use the training set to tune the hyperparameters by grid search. The neural architecture and other hyperparameters are largely the same as MAML. The maximum number of adaptation gradient descent steps is 10 for both MAML and MAML-stop. We add Gaussian noise to the images with a random noise level σ 55 during training and validation phases. |