Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Simple and Effective Specialized Representations for Fair Classifiers
Authors: Alberto Sinigaglia, Davide Sartor, Marina Ceccon, Gian Antonio Susto
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark datasets demonstrate that our approach consistently matches or achieves better fairness and predictive accuracy than existing methods. Moreover, our method maintains robustness and computational efficiency, making it a practical solution for real-world applications. |
| Researcher Affiliation | Academia | Alberto Sinigaglia Human Inspired Technology Research Center University of Padua EMAIL Davide Sartor Department of Information Engineering University of Padua EMAIL Marina Ceccon Department of Information Engineering University of Padua EMAIL Gian Antonio Susto Department of Information Engineering University of Padua EMAIL |
| Pseudocode | Yes | Algorithm 1 Fm CF loss term 1: Input: encoder hθ, predictor fθ, batch B = {(xi, yi, si) PX,Y,S}. 2: z hθ(x) 3: ˆy fθ(z) 4: for all s S do 5: for all j {1, . . . , k} do 6: Sample tj PT 7: φN (tj) e 0.5 tj 2 8: ˆφZ|s(tj) = 1 n Pn i=1 ei tj,zi 9: end for 10: end for φN (tj) ˆφZ|s(tj) 2 12: L LC(ˆy, y) + α LCF |
| Open Source Code | Yes | Justification: Datasets are publicly available. We provide the code, preprocessing, configurations, and dataloaders to reproduce all results in the paper. |
| Open Datasets | Yes | We utilize six well-known datasets in our study, sourced from both the UCI Machine Learning Repository and other publicly available resources. These datasets include Adult, Crime, Compas, Law School, Health, and the Statlog (German Credit Data) dataset. ... Adult3: The Adult dataset, also known as the Census Income dataset, originates from the 1994 Census database and is available through the UCI repository [17]. ... Crime4: The Communities and Crime dataset ... Compas5: The Compas dataset ... Health6: The Health dataset was part of the Heritage Health Prize competition on Kaggle ... German8: The German Credit Data dataset, sourced from the UCI Machine Learning Repository [17] |
| Dataset Splits | No | The paper mentions using specific datasets (Adult, German, Compas, Health, Crime) and references setups from other papers (e.g., "same setup as [2]", "same experimental setup as in [37]"). However, it does not explicitly state the train/test/validation splits (e.g., percentages, sample counts, or specific split files) within the text for the experiments performed in *this* paper. It only mentions the datasets themselves. |
| Hardware Specification | Yes | All experiments reported in this paper were implemented using Py Torch. The models were trained on a server equipped with an AMD Ryzen Threadripper PRO 5995WX CPU (64 cores, 128 threads), 512 Gi B of RAM, and three NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | All experiments reported in this paper were implemented using Py Torch. The Adam optimizer [30] was employed for all training sessions, with a learning rate of 0.0003. |
| Experiment Setup | Yes | All MLPs used for adversarial evaluations, encoding, and classification consist of four layers with 64 neurons each. ... The Adam optimizer [30] was employed for all training sessions, with a learning rate of 0.0003. Training was conducted for 100 epochs, incorporating L2 regularization with a weight penalty of 0.0001 to mitigate overfitting. |