reproducibilityindex.ai

FairyTED: A Fair Rating Predictor for TED Talk Data

Authors: Rupam Acharyya, Shouman Das, Ankani Chattoraj, Md. Iftekhar Tanveer338-345

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that while prediction accuracy is comparable to recent work on this dataset, our predictions are counterfactually fair with respect to a novel metric when compared to true data labels.
Researcher Affiliation	Collaboration	Rupam Acharyya,1 Shouman Das,1 Ankani Chattoraj,1 Md. Iftekhar Tanveer2 1University of Rochester, 2Comcast Applied AI Research
Pseudocode	No	The paper describes the methodology in prose and mathematical equations but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is made publicly available or provide a link to it.
Open Datasets	No	The paper states: 'The data analyzed in our study was collected from TED talk website (ted.com). We crawled the website to obtain data from 2400 videos published between 2006 and 2017...' and 'We use Amazon mechanical turk to collect data on the protected attributes S (race and gender).' However, it does not provide a specific link, DOI, or formal citation for their processed and annotated dataset to be publicly accessed.
Dataset Splits	No	The paper describes creating 'augmented datasets' for training and refers to a 'test dataset' for evaluation, but it does not specify explicit percentages or sample counts for train/validation/test splits, nor does it detail a cross-validation setup or predefined splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Gensim package', 'Py MC3', and 'AIF360', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We train a neural network with one hidden layer of 400 nodes to predict ratings Y. The loss function used to train the network has two parts: 1) the ﬁrst part minimizes prediction error when compared to true data labels and 2) the second part reduces disparity between the labels of observed values of S and their corresponding counterfactuals. We use binary cross entropy loss (BCE) to calculate the prediction error and an unfairness function u to estimate the unfairness of the classiﬁer as, BCE g(si, xi), yi + γ C c=1 max{0, \|g(si, xc i) g(s i, xc i)\| ϵ} (2) where C represents the number of counterfactual samples for each observed data instance and ϵ is a hyperparameter which makes sure that our predictor maintains a (ϵ, δ)-approximate counterfactual fairness (δ is a function of γ, for more details about the choice of the unfairness function, please refer to (Russell et al. 2017)). We tune γ and ϵ to obtain best results in our causal models, see Table 3.