Auxiliary Modality Learning with Generalized Curriculum Distillation
Authors: Yu Shen, Xijun Wang, Peng Gao, Ming Lin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also analyze the conditions under which AML works well from the optimization and data distribution perspectives. To guide various choices to achieve optimal performance using AML, we propose a novel method to assist in choosing the best auxiliary modality and estimating an upper bound performance before executing AML. In addition, we propose a new AML method using generalized curriculum distillation to enable more effective curriculum learning. Our method achieves the best performance compared to other SOTA methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park, Maryland, USA. Correspondence to: Yu Shen <yushen@umd.edu>. |
| Pseudocode | Yes | Algorithm 1 SAMD Training Paradigm Input: Training data from main modality IM, training data from auxiliary modality IA (chosen by method in Sec. 5.1) Output: student network weights θstu Initialisation: Training Round number t, epoch number in each round k, loss correlation β, network weights θstu and θtea. for r = 1 to t do Reset teacher weights with student weights for e = 1 to k do Feed IM and IA into teacher, update teacher weights θtea with Eq. 2 end for for e = 1 to k do Feed IM and IA into student, and feed IM into student, update student weights θstu with Eq. 1 and loss 3 end for end for |
| Open Source Code | No | Since there s no open-source code, we reimplement the original work, then apply our method to it. |
| Open Datasets | Yes | We use Audi dataset (Geyer et al., 2020) and Honda dataset (Ramanishka et al., 2018) in this experiment. Also, we use depth map (Type 1), frequency image (Type 2), and attention image (Type 3) as Auxiliary modalities. |
| Dataset Splits | Yes | For dataset, we use the CARLA (Dosovitskiy et al., 2017) simulator for training and testing, specifically CARLA 0.9.10 which includes 8 publicly available towns. We use 7 towns for training and hold out Town05 for evaluation, as in (Prakash et al., 2021). |
| Hardware Specification | Yes | All experiments are conducted using one Intel(R) Xeon(TM) W-2123 CPU, two Nvidia GTX 1080 GPUs, and 32G RAM. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as particular deep learning frameworks or libraries. |
| Experiment Setup | Yes | We use the SGD optimizer with learning rate 0.001 and batch size 128 for training. The number of epochs is 2,000. The loss correlation β is set with different values for different knowledge distillation methods following (Tian et al., 2020). We pick epoch number in each round k = 5 from ablation study of k = 1, 2, 5, 20. We set the round number n = 400 for Audi dataset and n = 40 for Honda dataset. |