Interpretations of Domain Adaptations via Layer Variational Analysis
Authors: Huan-Hsin Tseng, Hsin-Yi Lin, Kuo-Hsuan Hung, Yu Tsao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method. |
| Researcher Affiliation | Academia | Research Center for Information Technology Innovation, Academia Sinica, Taiwan {htseng, hylin, khhung, yu.tsao}@citi.sinica.edu.tw |
| Pseudocode | No | The paper contains mathematical derivations and equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks or structured code-like procedures. |
| Open Source Code | Yes | The code is available on Github1. 1https://github.com/HHTseng/Layer-Variational-Analysis.git |
| Open Datasets | Yes | 8,000 utterances (corresponding to N1 = 112,000 patches) were randomly excerpted from the Deep Noise Suppression Challenge (Reddy et al., 2020) dataset; 2000 high-resolution images were randomly selected as labels from the CUFED dataset (Wang et al., 2016) to train SRCNN. |
| Dataset Splits | Yes | Speech data pairs were prepared for the source domain D (served as the training set) and target domain e D (served as the adaptation set); For the training set, the 8000 clean utterances were equally divided and contaminated by the five noise types... to form the training set. For the testing set, 100 clean utterances were contaminated... For the adaptation set, we prepared 20-400 patches... |
| Hardware Specification | Yes | One NVIDIA V100 GPU (32 GB GPU memory) with 4 CPUs (128 GB CPU memory). |
| Software Dependencies | No | The paper mentions optimizers like ADAM and network architectures like BLSTM, but does not provide specific version numbers for any software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The pretrained model f using D was a 3 fully-connected-layer network with 64 nodes at each layer and Re Lu used as activation functions except the output layer. A finetuned model g GD using Gradient Descent (GD) retrained the last layer of f by e D. f and g GD were trained under L2-loss with ADAM optimizer at learning rate 10 3 by 8,000 and 12,000 epochs, respectively. |