reproducibilityindex.ai

Learning To Stop While Learning To Predict

Authors: Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le Song

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology, USA 2Google Research, USA 3King Abdullah University of Science and Technology, Saudi Arabia 4Ant Financial, China.
Pseudocode	Yes	Algorithm 1 Overall Algorithm Randomly initialized θ and φ. For itr = 1 to #iterations do Stage I. Sample a batch of data points B D. Take an optimization step to update θ towards the marginal likelihood function deﬁned in Eq. 9. For itr = 1 to #iterations do Stage II. Sample a batch of data points B D. Take an optimization step to update φ towards the reverse KL divergence deﬁned in Eq. 10. For itr = 1 to #iterations do Optional Step Sample a batch of data points B D. Update both θ and φ towards β-VAE objective in Eq. 6. return θ, φ
Open Source Code	Yes	Pytorch implementation of the experiments is released at https://github.com/xinshi-chen/l2stop.
Open Datasets	Yes	We use the benchmark datasets Omniglot (Lake et al., 2011) and Mini Imagenet (Ravi & Larochelle, 2017). The models are trained on BSD500 (400 images) (Arbelaez et al., 2010), validated on BSD12, and tested on BSD68 (Martin et al., 2001).
Dataset Splits	Yes	Each task is an N-way classiﬁcation that contains meta-{train, valid, test} sets. The models are trained on BSD500 (400 images) (Arbelaez et al., 2010), validated on BSD12, and tested on BSD68 (Martin et al., 2001).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing instance types.
Software Dependencies	No	The paper mentions 'Pytorch implementation' but does not specify any version numbers for PyTorch or other software dependencies, which are necessary for reproducibility.
Experiment Setup	Yes	The signal-to-noise ratio (SNR) for each sample is uniformly sampled from 20, 30, and 40. The training loss for LISTA is PT t=1 γT t xt x 2 2 where γ 1. For ISTA and FISTA, we use the training set to tune the hyperparameters by grid search. The neural architecture and other hyperparameters are largely the same as MAML. The maximum number of adaptation gradient descent steps is 10 for both MAML and MAML-stop. We add Gaussian noise to the images with a random noise level σ 55 during training and validation phases.