Minimax Curriculum Learning: Machine Teaching with Desirable Difficulties and Scheduled Diversity

Authors: Tianyi Zhou, Jeff Bilmes

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we apply different curriculum learning methods to train logistic regression models on 20newsgroups (Lang, 1995), Le Net5 models on MNIST (Lecun et al., 1998), convolutional neural nets (CNNs) with three convolutional layers1 on CIFAR10 (Krizhevsky & Hinton, 2009), CNNs with two convolutional layers 2 on Fashion-MNIST ( Fashion in all tables) (Xiao et al., 2017), CNNs with six convolutional layers on STL10 (Coates et al., 2011), and CNNs with seven convolutional layers on SVHN (Netzer et al., 2011)3. Details on the datasets can be found in Table 3 of the appendix. In all cases, we also use ℓ2 parameter regularization on w with weight 1e 4 (i.e., the weight decay factor of the optimizer).
Researcher Affiliation Academia Tianyi Zhou & Jeff Bilmes University of Washington, Seattle {tianyizh,bilmes}@uw.edu
Pseudocode Yes Algorithm 1 Minimax Curriculum Learning (MCL)
Open Source Code No The paper mentions external GitHub links in footnotes for network structures (e.g., 'https://github.com/jseppanen/cifar_lasagne', 'https://github.com/aaron-xichen/pytorch-playground'), but these are for third-party components or models used, not for the authors' own implementation of MCL or its variants. There is no explicit statement of the authors releasing their source code for the methodology described in this paper.
Open Datasets Yes In this section, we apply different curriculum learning methods to train logistic regression models on 20newsgroups (Lang, 1995), Le Net5 models on MNIST (Lecun et al., 1998), convolutional neural nets (CNNs) with three convolutional layers1 on CIFAR10 (Krizhevsky & Hinton, 2009), CNNs with two convolutional layers 2 on Fashion-MNIST ( Fashion in all tables) (Xiao et al., 2017), CNNs with six convolutional layers on STL10 (Coates et al., 2011), and CNNs with seven convolutional layers on SVHN (Netzer et al., 2011)3.
Dataset Splits No Table 3 in the appendix provides '#Training' and '#Test' sample counts for each dataset, but the paper does not explicitly specify validation dataset splits, percentages, or methodology for creating such splits.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions general software components like 'mini-batch SGD' and 'mini-batch k-means algorithm', but does not provide specific version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes In all cases, we also use ℓ2 parameter regularization on w with weight 1e 4 (i.e., the weight decay factor of the optimizer). Each method uses mini-batch SGD for π( , η) with the same learning rate strategy to update w. ... For MCL, we set the number of inner loop iterations to p 50. ... Table 4: Parameters of MCL (Algorithm 1) and its variants for different datasets. (Includes p, #cluster, γ, initial k, initial λ, initial η)