Semi-Supervised Learning From Crowds Using Deep Generative Models

Authors: Kyohei Atarashi, Satoshi Oyama, Masahito Kurihara

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the effectiveness of the proposed model, we compared with four existing models, including a baseline model, on the MNIST dataset with simulated workers and the Rotten Tomatoes movie review dataset with multiple AMT workers.
Researcher Affiliation Academia Kyohei Atarashi Hokkaido University atarashi k@complex.ist.hokudai.ac.jp Satoshi Oyama Hokkaido University/RIKEN AIP oyama@ist.hokudai.ac.jp Masahito Kurihara Hokkaido University kurihara@ist.hokudai.ac.jp
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes MNIST. We used MNIST (Le Cun et al. 1998) as a benchmark dataset.
Dataset Splits Yes For semi-supervised learning from crowds, we split the 50,000 training data points between a crowdsourced labeled set Xc, Nc = 100 and an unlabeled set Xu, Nu = 49, 900. In MNIST, x [0, 1]784 and K = 10.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions optimization methods like Adam and BFGS, and general neural network components (MLPs, batch normalization), but it does not specify software versions (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x) or specific library versions that would enable replication.
Experiment Setup Yes In all experiments, the architectures of our model and the M2 model were almost the same. We used batch normalization (Ioffe and Szegedy 2015) for all hidden layers of the neural networks except the decoder for the Rotten Tomatoes dataset. For estimation (optimization) of the parameters, we used the Adam stochastic optimization method (Kingma and Ba 2015). The mini-batch size was 200, and half the data points in each mini-batch were labeled examples. For the MNIST dataset, we defined the classifier, encoder, and decoder as MLPs with one hidden layer with 600 units, and dz was 100. We set the learning rate to 3e-4. Because x [0, 1]784 for the MNIST dataset, we binarized the feature vectors and defined p(x | t, z) as a Bernoulli distribution. For the Rotten Tomatoes dataset, we defined the encoder and the decoder as MLPs with two hidden layers and the classifier as an MLP with one hidden layer with 500 units. dz was also 100. We set the learning rate to 1e-6. We set the exponential decay rate for the 1st and 2nd moment to default values. The architecture of the MV-MLP was same as the architecture of the classifier of the proposed model and the M2 model. We reparameterized α, which is a weight hyper parameter between generative loss and pseudo classification loss of proposed model, as α = β Nc+Nu / Nc like Maaløe et al. (2016). We set β = 1 on the MNIST dataset and β = 10 on the Rotten Tomatoes dataset.