L2T-DLN: Learning to Teach with Dynamic Loss Network
Authors: Zhaoyang Hai, Liyuan Pan, Xiabi Liu, Zhengzheng Liu, Mirna Yunita
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios. We conduct extensive experiments on a wide range of loss functions and tasks to demonstrate the effectiveness of our approach. |
| Researcher Affiliation | Academia | School of Computer Science and Technology, Beijing Institute of Technology Beijing, China, 100081 {haizhaoyang, liyuan.pan, liuxiabi, liuzhengzheng, mirnayunita}@bit.edu.cn |
| Pseudocode | Yes | Algorithm 1 Obtaining the optimal student, DLN and teacher in L2T-DLN. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Datasets. We evaluate our method on three tasks, i.e., image classification, objective detection, and semantic segmentation. For the image classification, we use three datasets: CIFAR-10 [20], CIFAR-100 [21], and Image Net [33]. |
| Dataset Splits | Yes | For the objective detection, we use MS-COCO dataset [22], which contains 82783, 40504, and 81434 pairs in the training, validation, and testing set separately. For the semantic segmentation, we choose PASCAL VOC 2012 [6]. Following the procedure of Zhao et al. [40], we use augmented data with the annotation of Hariharan et al. [11], resulting in 10582, 1449, and 1456 images for training, validation, and testing. The validation ratio represents the fraction of training dataset samples exclusively used for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizing student models using "standard stochastic gradient descent (SGD)" and the teacher model with "Adam", but does not provide specific version numbers for these optimizers or any other software libraries or frameworks used. |
| Experiment Setup | Yes | In all experiments, we optimize student models using standard stochastic gradient descent (SGD) with a learning rate of 0.1. The teacher model is trained with Adam, utilizing a learning rate of 0.001. The learning rate of DLN is set to 0.001. The teacher model is trained for 10 epochs, with redividing the training and validation data after each epoch. ... Our teacher model comprises a four-layer LSTM [14] with 64 neurons in the first three layers and 1 neuron in the final layer. ... We perform a five-layer fully connected network, which contains 40 neurons in each hidden layer and 1 neuron in the output layer, as the DLN. The activation function for each hidden layer is set to Leaky-Re LU. |