Generalizing Orthogonalization for Models with Non-Linearities

Authors: David RĂ¼gamer, Chris Kolb, Tobias Weber, Lucas Kook, Thomas Nagler

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we validate our method s effectiveness in safeguarding sensitive data in generalized linear models, normalizing convolutional neural networks for metadata, and rectifying pre-existing embeddings for undesired attributes.
Researcher Affiliation Academia 1Department of Statistics, LMU Munich, Munich, Germany 2Munich Center for Machine Learning (MCML), Munich, Germany 3Institute for Statistics and Mathematics, Vienna University of Economics and Business, Vienna, Austria.
Pseudocode No The paper describes methods in narrative text and does not include explicit 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code for reproducing results can be found on the first author s Github repository.
Open Datasets Yes Using the adult income data also investigated in Xu et al. (2022) to analyze algorithm fairness... MIMIC Chest X-Ray dataset (Johnson et al., 2019; Sellergren et al., 2022)... UTKFace dataset (Zhang et al., 2017)... movies review dataset (Maas et al., 2011)... colorize the MNIST data...
Dataset Splits Yes Early stopping is based on a 20% validation split and a patience of 25.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud instance types used for running its experiments.
Software Dependencies No The paper mentions optimizers like Adam but does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The LSTM model is defined by an embedding layer with embedding size 100, an LSTM layer with 50 units and Re LU activation, a dropout layer with 0.1 dropout rate, a dense layer with 25 units and Re LU activation, a dropout layer with 0.2 dropout rate, a dense layer with 5 units and Re LU activation, a dropout layer with 0.3 dropout rate, and a final dense layer with 1 unit and exponential activation. The network is trained for a maximum of 1000 epochs with early stopping using Adam with a learning rate of 1e-6, a batch size of 128, and Poisson loss.