Generalizing Orthogonalization for Models with Non-Linearities
Authors: David RĂ¼gamer, Chris Kolb, Tobias Weber, Lucas Kook, Thomas Nagler
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we validate our method s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes. |
| Researcher Affiliation | Academia | 1Department of Statistics, LMU Munich, Munich, Germany 2Munich Center for Machine Learning (MCML), Munich, Germany 3Institute for Statistics and Mathematics, Vienna University of Economics and Business, Vienna, Austria. |
| Pseudocode | No | The paper describes methods in narrative text and does not include explicit 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code for reproducing results can be found on the first author s Github repository. |
| Open Datasets | Yes | Using the adult income data also investigated in Xu et al. (2022) to analyze algorithm fairness... MIMIC Chest X-Ray dataset (Johnson et al., 2019; Sellergren et al., 2022)... UTKFace dataset (Zhang et al., 2017)... movies review dataset (Maas et al., 2011)... colorize the MNIST data... |
| Dataset Splits | Yes | Early stopping is based on a 20% validation split and a patience of 25. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud instance types used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The LSTM model is defined by an embedding layer with embedding size 100, an LSTM layer with 50 units and Re LU activation, a dropout layer with 0.1 dropout rate, a dense layer with 25 units and Re LU activation, a dropout layer with 0.2 dropout rate, a dense layer with 5 units and Re LU activation, a dropout layer with 0.3 dropout rate, and a final dense layer with 1 unit and exponential activation. The network is trained for a maximum of 1000 epochs with early stopping using Adam with a learning rate of 1e-6, a batch size of 128, and Poisson loss. |