Beyond Shared Hierarchies: Deep Multitask Learning through Soft Layer Ordering

Authors: Elliot Meyerson, Risto Miikkulainen

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a suite of experiments, soft ordering is shown to improve performance over single-task learning as well as over fixed order deep MTL methods. These experiments evaluate soft ordering against fixed ordering MTL and single-task learning. The first experiment applies them to intuitively related MNIST tasks, the second to superficially unrelated UCI tasks, the third to the real-world problem of Omniglot character recognition, and the fourth to large-scale facial attribute recognition.
Researcher Affiliation Collaboration Elliot Meyerson & Risto Miikkulainen The University of Texas at Austin and Sentient Technologies, Inc. {ekm, risto}@cs.utexas.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper mentions using Keras and TensorFlow frameworks but does not provide a link or explicit statement about releasing its own source code for the methodology described.
Open Datasets Yes The Omniglot dataset (Lake et al., 2015), MNIST hand-written digit dataset, UCI classification data sets (Lichman, 2013), and Celeb A dataset (Liu et al., 2015b) are all well-known public datasets and are cited appropriately.
Dataset Splits Yes For UCI, "training and validation data were created by a random 80-20 split." For Omniglot, "Train/test splits are created in the same way as previous work (Yang and Hospedales, 2017), using 10% or 20% of data for testing." For Celeb A, "The training, validation, and test splits provided by Liu et al. (2015b) were used. There are 160K images for training, 20K for validation, and 20K for testing."
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper states: "All experiments were run with the Keras deep learning framework Chollet et al. (2015), using the Tensorflow backend (Abadi et al., 2015)." However, it does not specify the version numbers for Keras or TensorFlow, which are required for reproducibility.
Experiment Setup Yes The paper provides specific experimental setup details: "A dropout rate of 0.5 was applied at the output of each core layer." (MNIST, Omniglot, Celeb A) and "A dropout rate of 0.8 was applied at the output of each core layer." (UCI). For MNIST, "Each setup was trained for 20K iterations, with each batch consisting of 64 samples for each task." For Celeb A, "The experiments used a batch size of 32. After validation loss converges via Adam, models are trained with RMSProp with learning rate 1e-5." It also states, "Cross-entropy loss was used for all classification tasks." and "Mean squared error was used as the training loss."