Adaptive Mixing of Auxiliary Losses in Supervised Learning

Authors: Durga Sivasubramanian, Ayush Maheshwari, Prathosh AP, Pradeep Shenoy, Ganesh Ramakrishnan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains.
Researcher Affiliation Collaboration 1Indian Institute of Technology Bombay 2Google Research, India 3Indian Institute of Science, Bengaluru
Pseudocode Yes Algorithm 1: Algorithm for learning λs via meta learning
Open Source Code Yes The code for AMAL is at: https: //github.com/durgas16/AMAL.git.
Open Datasets Yes The datasets in our experiments include CIFAR100 (Krizhevsky 2009), Stanford Cars (Krause et al. 2013)and FGVC-Aircraft (Maji et al. 2013).
Dataset Splits Yes For datasets without pre-specified validation sets, we split the original training set into new train (90%) and validation sets (10%).
Hardware Specification No The paper mentions the use of deep learning models like ResNet and Wide Residual Networks but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper refers to optimization methods (SGD) and general deep learning practices but does not specify versions for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes Training consisted of SGD optimization with an initial learning rate of 0.05, momentum of 0.9, and weight decay of 5e-4. We divided the learning rate by 0.1 on epochs 150, 180 and 210 and trained for a total of 240 epochs.