Auxiliary Modality Learning with Generalized Curriculum Distillation

Authors: Yu Shen, Xijun Wang, Peng Gao, Ming Lin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also analyze the conditions under which AML works well from the optimization and data distribution perspectives. To guide various choices to achieve optimal performance using AML, we propose a novel method to assist in choosing the best auxiliary modality and estimating an upper bound performance before executing AML. In addition, we propose a new AML method using generalized curriculum distillation to enable more effective curriculum learning. Our method achieves the best performance compared to other SOTA methods.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park, Maryland, USA. Correspondence to: Yu Shen <yushen@umd.edu>.
Pseudocode Yes Algorithm 1 SAMD Training Paradigm Input: Training data from main modality IM, training data from auxiliary modality IA (chosen by method in Sec. 5.1) Output: student network weights θstu Initialisation: Training Round number t, epoch number in each round k, loss correlation β, network weights θstu and θtea. for r = 1 to t do Reset teacher weights with student weights for e = 1 to k do Feed IM and IA into teacher, update teacher weights θtea with Eq. 2 end for for e = 1 to k do Feed IM and IA into student, and feed IM into student, update student weights θstu with Eq. 1 and loss 3 end for end for
Open Source Code No Since there s no open-source code, we reimplement the original work, then apply our method to it.
Open Datasets Yes We use Audi dataset (Geyer et al., 2020) and Honda dataset (Ramanishka et al., 2018) in this experiment. Also, we use depth map (Type 1), frequency image (Type 2), and attention image (Type 3) as Auxiliary modalities.
Dataset Splits Yes For dataset, we use the CARLA (Dosovitskiy et al., 2017) simulator for training and testing, specifically CARLA 0.9.10 which includes 8 publicly available towns. We use 7 towns for training and hold out Town05 for evaluation, as in (Prakash et al., 2021).
Hardware Specification Yes All experiments are conducted using one Intel(R) Xeon(TM) W-2123 CPU, two Nvidia GTX 1080 GPUs, and 32G RAM.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as particular deep learning frameworks or libraries.
Experiment Setup Yes We use the SGD optimizer with learning rate 0.001 and batch size 128 for training. The number of epochs is 2,000. The loss correlation β is set with different values for different knowledge distillation methods following (Tian et al., 2020). We pick epoch number in each round k = 5 from ablation study of k = 1, 2, 5, 20. We set the round number n = 400 for Audi dataset and n = 40 for Honda dataset.