Interpretations of Domain Adaptations via Layer Variational Analysis

Authors: Huan-Hsin Tseng, Hsin-Yi Lin, Kuo-Hsuan Hung, Yu Tsao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method.
Researcher Affiliation Academia Research Center for Information Technology Innovation, Academia Sinica, Taiwan {htseng, hylin, khhung, yu.tsao}@citi.sinica.edu.tw
Pseudocode No The paper contains mathematical derivations and equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks or structured code-like procedures.
Open Source Code Yes The code is available on Github1. 1https://github.com/HHTseng/Layer-Variational-Analysis.git
Open Datasets Yes 8,000 utterances (corresponding to N1 = 112,000 patches) were randomly excerpted from the Deep Noise Suppression Challenge (Reddy et al., 2020) dataset; 2000 high-resolution images were randomly selected as labels from the CUFED dataset (Wang et al., 2016) to train SRCNN.
Dataset Splits Yes Speech data pairs were prepared for the source domain D (served as the training set) and target domain e D (served as the adaptation set); For the training set, the 8000 clean utterances were equally divided and contaminated by the five noise types... to form the training set. For the testing set, 100 clean utterances were contaminated... For the adaptation set, we prepared 20-400 patches...
Hardware Specification Yes One NVIDIA V100 GPU (32 GB GPU memory) with 4 CPUs (128 GB CPU memory).
Software Dependencies No The paper mentions optimizers like ADAM and network architectures like BLSTM, but does not provide specific version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The pretrained model f using D was a 3 fully-connected-layer network with 64 nodes at each layer and Re Lu used as activation functions except the output layer. A finetuned model g GD using Gradient Descent (GD) retrained the last layer of f by e D. f and g GD were trained under L2-loss with ADAM optimizer at learning rate 10 3 by 8,000 and 12,000 epochs, respectively.