FairyTED: A Fair Rating Predictor for TED Talk Data
Authors: Rupam Acharyya, Shouman Das, Ankani Chattoraj, Md. Iftekhar Tanveer338-345
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that while prediction accuracy is comparable to recent work on this dataset, our predictions are counterfactually fair with respect to a novel metric when compared to true data labels. |
| Researcher Affiliation | Collaboration | Rupam Acharyya,1 Shouman Das,1 Ankani Chattoraj,1 Md. Iftekhar Tanveer2 1University of Rochester, 2Comcast Applied AI Research |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is made publicly available or provide a link to it. |
| Open Datasets | No | The paper states: 'The data analyzed in our study was collected from TED talk website (ted.com). We crawled the website to obtain data from 2400 videos published between 2006 and 2017...' and 'We use Amazon mechanical turk to collect data on the protected attributes S (race and gender).' However, it does not provide a specific link, DOI, or formal citation for their processed and annotated dataset to be publicly accessed. |
| Dataset Splits | No | The paper describes creating 'augmented datasets' for training and refers to a 'test dataset' for evaluation, but it does not specify explicit percentages or sample counts for train/validation/test splits, nor does it detail a cross-validation setup or predefined splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Gensim package', 'Py MC3', and 'AIF360', but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We train a neural network with one hidden layer of 400 nodes to predict ratings Y. The loss function used to train the network has two parts: 1) the first part minimizes prediction error when compared to true data labels and 2) the second part reduces disparity between the labels of observed values of S and their corresponding counterfactuals. We use binary cross entropy loss (BCE) to calculate the prediction error and an unfairness function u to estimate the unfairness of the classifier as, BCE g(si, xi), yi + γ C c=1 max{0, |g(si, xc i) g(s i, xc i)| ϵ} (2) where C represents the number of counterfactual samples for each observed data instance and ϵ is a hyperparameter which makes sure that our predictor maintains a (ϵ, δ)-approximate counterfactual fairness (δ is a function of γ, for more details about the choice of the unfairness function, please refer to (Russell et al. 2017)). We tune γ and ϵ to obtain best results in our causal models, see Table 3. |