Differentiable Expectation-Maximization for Set Representation Learning

Authors: Minyoung Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our DIEM model empirically on two different types of tasks: i) counting and clustering problems (Sec. 6.1, 6.2) to verify the model s capability of learning general set representations by modeling interaction between set elements, ii) large-scale biological sequence classification and NLP tasks (Sec. 6.3, 6.4, 6.5) to test the performance of the proposed model on real-world problems in both supervised and unsupervised settings.
Researcher Affiliation Industry Minyoung Kim Samsung AI Center Cambridge, UK mikim21@gmail.com
Pseudocode No The paper describes the EM algorithm steps using equations and textual descriptions, but does not include a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes From the OMNIGLOT dataset (Lake et al., 2015) which contains 1,623 different handwritten characters with 20 examples each, we build a set by randomly selecting n images, and the task is to predict the number of unique characters in the set. ... For CIFAR-100, each image is represented as a 512-dim vector from a VGG network pre-trained on the training set (Simonyan & Zisserman, 2014). ... We use the preprocessed data of the SCOP version 1.75 and 2.06 from (Hou et al., 2019; Mialon et al., 2021)... The dataset (Socher et al., 2013) consists of 70,042 movie reviews with positive/negative binary sentiment. ... Finally, we test our DIEM on the large-scale Deep SEA dataset (Zhou & Troyanskaya, 2015) with about five million genomic sequences.
Dataset Splits Yes The hyperparameters in our DIEM include: p (the mixture order), H (the number of heads), k (the number of EM steps), τ (prior impact), and the multi-head pooling strategy (either of PC, SB, or SB2). We report the results of the best combinations that are selected by cross validation. ... Specifically, we randomly choose the number of images n uniformly from {Nmin, . . . , Nmax}, then choose the number of unique characters c randomly from {cmin, . . . , n}. ... For the models, we adopt a similar feature extractor architecture as (Lee et al., 2019a): first apply four Conv-BN-Re LU layers to each (element) image to have feature representation ϕ(x), then perform the set embedding emb(S) whose output is fed into a fully connected layer to return the Poisson parameter λ. ... we split the 1,623 characters into training/validation/test splits... consists of 19,245 sequences (14,699/2,013 training/validation from SCOP 1.75 and 2,533 test from SCOP 2.06). ... the original 67,349 training data are split into 80%/20% training/validation sets, and the 872 validation reviews form a test set.
Hardware Specification Yes We run all models on the same machine, Core i7 3.50GHz CPU and 128 GB RAM with a single GPU (RTX 2080 Ti).
Software Dependencies No The paper mentions using the Adam optimizer (Kingma & Ba, 2015) and pre-trained BERT model (Devlin et al., 2019; Wolf et al., 2019), but does not specify version numbers for these or other software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes The hyperparameters in our DIEM include: p (the mixture order), H (the number of heads), k (the number of EM steps), τ (prior impact), and the multi-head pooling strategy (either of PC, SB, or SB2). We report the results of the best combinations that are selected by cross validation. ... For all models, we use the Adam optimizer (Kingma & Ba, 2015) with learning rate 10 4 and batch size = 32 sets, until 200K iterations.