reproducibilityindex.ai

Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing—and Back

Authors: Elliot Meyerson, Risto Miikkulainen

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an array of experiments, PTA is shown to signiﬁcantly improve performance in single-task settings.
Researcher Affiliation	Collaboration	1The University of Texas at Austin 2Sentient Technologies, Inc.. Correspondence to: Elliot Meyerson <ekm@cs.utexas.edu>.
Pseudocode	Yes	Algorithm 1 PTA Training Framework
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	This section evaluates and compares the various PTA methods on Omniglot character recognition (Lake et al., 2015). The experiments in this section apply PTA to LSTM models in the IMDB sentiment classiﬁcation problem (Maas et al., 2011). To further test applicability and scalability, PTA was evaluated on Celeb A large-scale facial attribute recognition (Liu et al., 2015b).
Dataset Splits	Yes	To reduce variance and improve reproducibility of experiments, a ﬁxed random 50/20/30% train/validation/test split was used for each task. (These splits will be released with the paper.)
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or processor types) used for running its experiments.
Software Dependencies	No	The paper mentions software like Keras and Adam but does not provide specific version numbers for any software components.
Experiment Setup	Yes	The underlying model F for all setups is a simple four layer convolutional network that has been shown to yield good performance on Omniglot (Meyerson & Miikkulainen, 2018). This model has four convolutional layers each with 53 ﬁlters and 3 3 kernels, and each followed by a 2 2 max-pooling layer and dropout layer with 0.5 dropout probability. At each meta-iteration, 250 gradient updates are performed via Adam (Kingma & Ba, 2014); each setup is trained for 100 meta-iterations. The LSTM layer has 128 units and dropout rate 0.2. Each meta-iteration consists of 250 gradient updates with batch size 32. RMSprop is initialized with a learning rate of 10-4, which is decreased to 10-5 and 10-6 when the model converges.