reproducibilityindex.ai

The Human-AI Substitution game: active learning from a strategic labeler

Authors: Tom Yan, Chicheng Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	I EXPERIMENTS To supplement our theoretical minimax analysis in the main section, we examine the performance of three learning algorithms, E-VS bisection, VS-bisection and randomly query (a point), in averagecase settings by randomly generating learning instances. Experiment Setup: We consider five sizes for the hypothesis class ranging from 15 to 40. Given a particular hypothesis class size \|H\|, we generate 50 random learning instances by randomly generating the binary labels of hypotheses on examples x X, where the number of data points \|X\| is varied from 5 to 30. Given a learning instance, we consider setting (the underlying hypothesis) h to be every h H, and thus average the query complexity across random instances as well as across H. This is done to explore the average-case query complexity, where we do not focus on the query complexity of one particular h = h H (as was done in some of the worst-case analyses). We investigate two possible labeling strategies, with varying amounts of abstention p = 0.0, 0.15, 0.3, 0.45, 0.6. The first strategy is that given the underlying hypothesis h H, it abstains on labeling a point x with probability p, and outputs h (x) otherwise (w.p. 1 p). This labeling strategy may be viewed as one that abstains arbitrarily, and may compromise identifiability. This models the labeling strategy of a myopic labeler. The second strategy is a more careful, adaptive labeling strategy that always ensures identifiability. Given the underlying h , when x is queried, it computes the resultant E-VS if x was abstained upon. If abstention leads to non-identifiability, it labels x and returns h (x). Otherwise, it abstains with probability p and provides the label otherwise. This may be viewed as a more shrewd labeling strategy that always ensures identifiability, while using some abstention. Results: We plot results in Figure 3 and Figure 4, with Figure 3 corresponding to the first (random labeling) strategy and Figure 4 corresponding to the identifiable labeling strategy.
Researcher Affiliation	Academia	Tom Yan Machine Learning Department Carnegie Mellon University tyyan@cmu.edu Chicheng Zhang Department of Computer Science University of Arizona chichengz@cs.arizona.edu
Pseudocode	Yes	Algorithm 2 E-VS Bisection Algorithm and Algorithm 3 Bisection Point Search Sub-routine are present on page 4.
Open Source Code	No	No explicit statement or link indicating the release of open-source code for the methodology described in the paper was found.
Open Datasets	No	The experiments use randomly generated learning instances and binary labels, not a publicly available or open dataset.
Dataset Splits	No	The paper describes experiments on randomly generated instances, but does not provide specific train/validation/test splits, cross-validation details, or predefined splits.
Hardware Specification	No	The paper describes the experiment setup (e.g., number of instances, hypothesis class sizes) but does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup	Yes	Experiment Setup: We consider five sizes for the hypothesis class ranging from 15 to 40. Given a particular hypothesis class size \|H\|, we generate 50 random learning instances by randomly generating the binary labels of hypotheses on examples x X, where the number of data points \|X\| is varied from 5 to 30. Given a learning instance, we consider setting (the underlying hypothesis) h to be every h H, and thus average the query complexity across random instances as well as across H. This is done to explore the average-case query complexity, where we do not focus on the query complexity of one particular h = h H (as was done in some of the worst-case analyses). We investigate two possible labeling strategies, with varying amounts of abstention p = 0.0, 0.15, 0.3, 0.45, 0.6.