Uncorrelation and Evenness: a New Diversity-Promoting Regularizer

Authors: Pengtao Xie, Aarti Singh, Eric P. Xing

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on healthcare, image and text data demonstrate the effectiveness of the regularizer.
Researcher Affiliation Collaboration 1Machine Learning Department, Carnegie Mellon University 2Petuum Inc.
Pseudocode No Section 3.3, titled 'Algorithm', describes the steps of the projected gradient descent algorithm in narrative form, rather than presenting it as structured pseudocode or an algorithm block.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We used five datasets in the experiments: an electronic health record dataset MIMIC-III (Johnson et al., 2016); two image datasets Stanford-Cars (Krause et al., 2013) and Caltech-UCSD-Birds (Welinder et al., 2010); two question answering (QA) datasets CNN and Daily Mail (Hermann et al., 2015).
Dataset Splits Yes For Stanford-Cars, CNN and Daily Mail, we use a single train/test split specified by the data providers; for the other two, five random splits are performed and the results are averaged over the five runs. and The model is trained for 8 epochs with early stop when the validation accuracy starts to drop. and We use 5-fold cross validation to tune the regularization parameter.
Hardware Specification No The paper mentions 'CPUs and GPUs' in the context of using the MAGMA library, but it does not specify any particular models, configurations, or detailed hardware specifications used for the experiments.
Software Dependencies No The paper mentions using the 'MAGMA library' and specific optimization algorithms like 'Ada Delta', but it does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup Yes The learning rate of the PGD algorithm is set to 0.001. and In character embedding based on convolutional neural network, 100 1D filters are used, each with a width of 5. The hidden state size is set to 100. Ada Delta (Zeiler, 2012) is used for optimization with a minibatch size of 48. Dropout (Srivastava et al., 2014) with probability 0.2 is used for all LSTM layers. The model is trained for 8 epochs with early stop when the validation accuracy starts to drop. and The best tuned regularization parameters of UER are: 0.001 for MIMIC, 0.01 for Cars and Birds. The best tuned component numbers are: 200 for MIMIC, 100 for Cars and 200 for Birds.