Adaptive Mixing of Auxiliary Losses in Supervised Learning
Authors: Durga Sivasubramanian, Ayush Maheshwari, Prathosh AP, Pradeep Shenoy, Ganesh Ramakrishnan
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. |
| Researcher Affiliation | Collaboration | 1Indian Institute of Technology Bombay 2Google Research, India 3Indian Institute of Science, Bengaluru |
| Pseudocode | Yes | Algorithm 1: Algorithm for learning λs via meta learning |
| Open Source Code | Yes | The code for AMAL is at: https: //github.com/durgas16/AMAL.git. |
| Open Datasets | Yes | The datasets in our experiments include CIFAR100 (Krizhevsky 2009), Stanford Cars (Krause et al. 2013)and FGVC-Aircraft (Maji et al. 2013). |
| Dataset Splits | Yes | For datasets without pre-specified validation sets, we split the original training set into new train (90%) and validation sets (10%). |
| Hardware Specification | No | The paper mentions the use of deep learning models like ResNet and Wide Residual Networks but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper refers to optimization methods (SGD) and general deep learning practices but does not specify versions for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | Training consisted of SGD optimization with an initial learning rate of 0.05, momentum of 0.9, and weight decay of 5e-4. We divided the learning rate by 0.1 on epochs 150, 180 and 210 and trained for a total of 240 epochs. |