Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing—and Back
Authors: Elliot Meyerson, Risto Miikkulainen
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an array of experiments, PTA is shown to significantly improve performance in single-task settings. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin 2Sentient Technologies, Inc.. Correspondence to: Elliot Meyerson <ekm@cs.utexas.edu>. |
| Pseudocode | Yes | Algorithm 1 PTA Training Framework |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | This section evaluates and compares the various PTA methods on Omniglot character recognition (Lake et al., 2015). The experiments in this section apply PTA to LSTM models in the IMDB sentiment classification problem (Maas et al., 2011). To further test applicability and scalability, PTA was evaluated on Celeb A large-scale facial attribute recognition (Liu et al., 2015b). |
| Dataset Splits | Yes | To reduce variance and improve reproducibility of experiments, a fixed random 50/20/30% train/validation/test split was used for each task. (These splits will be released with the paper.) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Keras and Adam but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | The underlying model F for all setups is a simple four layer convolutional network that has been shown to yield good performance on Omniglot (Meyerson & Miikkulainen, 2018). This model has four convolutional layers each with 53 filters and 3 3 kernels, and each followed by a 2 2 max-pooling layer and dropout layer with 0.5 dropout probability. At each meta-iteration, 250 gradient updates are performed via Adam (Kingma & Ba, 2014); each setup is trained for 100 meta-iterations. The LSTM layer has 128 units and dropout rate 0.2. Each meta-iteration consists of 250 gradient updates with batch size 32. RMSprop is initialized with a learning rate of 10-4, which is decreased to 10-5 and 10-6 when the model converges. |