Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Regression under Human Assistance
Authors: Abir De, Paramita Koley, Niloy Ganguly, Manuel Gomez-Rodriguez2611-2620
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic and real-world data from two important applications medical diagnosis and content moderation demonstrate that the greedy algorithm beats several competitive baselines. Finally, we experiment with synthetic and real-world data from two important applications medical diagnosis and content moderation. Our results show that the greedy algorithm beats several competitive algorithms, including the iterative algorithm for maximization of a difference of submodular functions mentioned above, and is able to identify and outsource to humans those samples where their expertise is required. |
| Researcher Affiliation | Academia | Abir De MPI-SWS EMAIL Paramita Koley IIT Kharagpur EMAIL Niloy Ganguly IIT Kharagpur EMAIL Manuel Gomez-Rodriguez MPI-SWS EMAIL |
| Pseudocode | Yes | Algorithm 1 Greedy algorithm Input: Ground set V, set of training samples {(xi, yi)}i V, parameters n and Ξ». Output: Set of items S 1: S 2: while |S| < n do 3: % Find best sample 4: k argmaxk V\S log β(S k) + log β(S) 5: % Sample is outsourced to humans 6: S S {k } 7: end while 8: return S |
| Open Source Code | Yes | To facilitate research in this area, we are releasing an open source implementation of our method1. 1https://github.com/Networks-Learning/regression-underassistance |
| Open Datasets | Yes | We experiment with four real-world datasets from two important applications, medical diagnosis and content moderation, which are publicly available (Davidson et al. ; Decenci ere et al. 2014; Hoover, Kouznetsova, and Goldbaum 2000). |
| Dataset Splits | No | Finally, in each experiment, we use 80% samples for training and 20% samples for testing. The paper does not explicitly mention a separate validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using fasttext and Resnet for feature extraction but does not provide specific version numbers for any software components or dependencies. |
| Experiment Setup | Yes | Experimental setup. For each sample (x, y), we ο¬rst generate each dimension of the feature vector x Rd uniformly at random, i.e., xi U( 1, 1) and then sample the response variable y from either (i) a Gaussian distribution N(1 x/d, Ο2 1) or (ii) a logistic distribution 1/(1 + exp( 1 x/d)). Moreover, we sample the associated human error from a Gaussian distribution, i.e., c(x, y) N(0, Ο2 2). In each experiment, we use |V| = 500 training samples and we compare the performance of the greedy algorithm with three competitive baselines:. In panel (a), we set Ξ» = 5 10 3 and, in panel (b), we set Ξ» = 10 3. |