A Differential Entropy Estimator for Training Neural Networks
Authors: Georg Pichler, Pierre Jean A. Colombo, Malik Boudiaf, Günther Koliander, Pablo Piantanida
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classification, and textual finetuning demonstrate the effectiveness of KNIFE-based estimation. |
| Researcher Affiliation | Academia | Georg Pichler * 1 Pierre Colombo * 2 Malik Boudiaf * 3 Günther Koliander 4 Pablo Piantanida 5 *Equal contribution 1Institute of Telecommunications, TU Wien, 1040 Vienna, Austria 2Laboratoire des Signaux et Systèmes (L2S),Paris-Saclay CNRS Centrale Supélec, 91190 Gif-sur-Yvette, France 3ÉTS Montreal, Quebec H3C 1K3, Canada 4Acoustics Research Institute, Austrian Academy of Sciences, 1040, Vienna, Austria 5International Laboratory on Learning Systems (ILLS), Université Mc Gill ETS MILA CNRS Université Paris-Saclay Centrale Supélec , Montreal, Quebec, Canada. Correspondence to: Georg Pichler <georg.pichler@ieee.org>. |
| Pseudocode | Yes | Algorithm 1 Disentanglement using a MI-based regularizer |
| Open Source Code | Yes | Code can be found at https: //github.com/g-pichler/knife. |
| Open Datasets | Yes | We closely follow the protocol of (Mahabadi et al., 2021) and work on the GLUE benchmark (Wang et al., 2019)... We follow the experimental setting from (Elazar & Goldberg, 2018; Barrett et al., 2019) and use two datasets from the DIAL corpus (Blodgett et al., 2016)... and consider a total of 6 source/target scenarios formed with MNIST (Le Cun & Cortes, 2010), MNIST-M (Ganin et al., 2016), SVHN (Netzer et al., 2011), CIFAR-10 (Krizhevsky et al., 2009), and STL-10 (Coates et al., 2011) datasets. |
| Dataset Splits | Yes | The evaluation is carried out on the standard validation splits as the test splits are not available. We follow the official split using 160 000 tweets for training and two additional sets composed of 10 000 tweets each for development and testing. We use early stopping (best model is selected on validation set error). |
| Hardware Specification | Yes | Training was performed on an NVidia V100 GPU. For all these experiments we rely on NVidia-P100 with 16GB of RAM. For these experiments, we used a cluster of NVIDIA-V100 with 16GB of RAM. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' and 'Adam (Kingma & Ba, 2015)' as tools used, but does not specify the exact version numbers of these software components. |
| Experiment Setup | Yes | The paper provides specific hyperparameter values in several tables and text sections, such as 'Learning Rate 0.01 Batch Size N 128 Kernel Size M 128 Iterations per epoch 200 Epochs 1 Runs 20' (Table 4), 'For model training, all models are trained for 6 epochs and we use early stopping... For IB, λ is selected in {10 4, 10 5, 10 6} and K is selected in {144, 192, 288, 384}. Additional hyper-parameters are reported in Table 11.', and 'We use Adam optimizer for all modules with a learning rate of 0.001. Batch size is set to 128. We set the weighting parameter λ = 0.1. ... use 25 000 iterations instead. Similar to other experiments, we set the kernel size M = 128.' (Appendix C.4). |