reproducibilityindex.ai

Amalgamating Multi-Task Models with Heterogeneous Architectures

Authors: Jidapa Thadajarassiri, Walter Gerych, Xiangnan Kong, Elke Rundensteiner

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that VENUS outperforms five alternative methods on numerous benchmark datasets across a broad spectrum of experiments. and Experimental Study Our method, datasets, and all experimental details are available at https://github.com/jida-thada/VENUS. Datasets We follow the recent KA works on multi-task learning (Ye et al. 2019; Shen et al. 2019b) by handling each class label in each dataset as an independent binary classification task. PASCAL VOC 2007 (Everingham and Winn 2010) has 9,963 images. Each image can have up to 20 object-type labels corresponding to 20 different predicting tasks. 3D contains four tasks extracted from the 3d-shapes dataset (Burgess and Kim 2018). The four tasks are to identify (1) whether the object s color is blue, (2) whether the floor s color is green, (3) whether the wall s color is purple, and (4) whether the wall s color is pink. The dataset contains 168,959 images in total. CIFAR-10 (Krizhevsky 2009) consists of 60,000 images. Each image is annotated with 10 class labels, leading to 10 binary classification tasks. and Experimental Results We report the accuracy of all tasks. To show the overall performance for a dataset, we follow (Thadajarassiri et al. 2023) by reporting the average rank of all compared methods across all tasks, where 1 indicates the best performance. For a fair comparison, we use Res Net18 (He et al. 2016) as the backbone model for the student in all experiments. Effectiveness of VENUS in learning a high quality common feature representation. We first investigate how effective our proposed method is compared against the other methods across all datasets. To observe this, for each dataset, we train a student from the two teachers with heterogeneous architectures Dense Net (Huang et al. 2017) for Teacher 1 and Res Net18 (He et al. 2016) for Teacher 2. Each of them is trained on approximately the same number of tasks with roughly 30% of their tasks shared. The results, as demonstrated in Table 1, show that the proposed VENUS outperforms alternative methods significantly as it reaches the best average accuracy across all tasks for all datasets.
Researcher Affiliation	Academia	Jidapa Thadajarassiri1, Walter Gerych2, Xiangnan Kong2, Elke Rundensteiner2 1Srinakharinwirot University, 2Worcester Polytechnic Institute, jidapath@g.swu.ac.th, {wgerych,xkong, rundenst}@wpi.edu
Pseudocode	No	The paper describes the method and its components (e.g., Shared Layers, Feature Consolidator, Task-Specific Layers) using prose and mathematical equations but does not include a structured pseudocode or algorithm block.
Open Source Code	Yes	Our method, datasets, and all experimental details are available at https://github.com/jida-thada/VENUS.
Open Datasets	Yes	Datasets We follow the recent KA works on multi-task learning (Ye et al. 2019; Shen et al. 2019b) by handling each class label in each dataset as an independent binary classification task. PASCAL VOC 2007 (Everingham and Winn 2010) has 9,963 images. Each image can have up to 20 object-type labels corresponding to 20 different predicting tasks. 3D contains four tasks extracted from the 3d-shapes dataset (Burgess and Kim 2018). The four tasks are to identify (1) whether the object s color is blue, (2) whether the floor s color is green, (3) whether the wall s color is purple, and (4) whether the wall s color is pink. The dataset contains 168,959 images in total. CIFAR-10 (Krizhevsky 2009) consists of 60,000 images. Each image is annotated with 10 class labels, leading to 10 binary classification tasks.
Dataset Splits	No	The paper states: 'In each experiment, the dataset is randomly split into 70% for training the teachers, 20% for training the student, and 10% for testing.' It does not explicitly provide a separate validation dataset split.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions 'Our model is written in Py Torch and optimized using Adam', but it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	No	The paper mentions using ResNet18 as the backbone and performing random splits and replications, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings beyond naming 'Adam'.