The Human-AI Substitution game: active learning from a strategic labeler
Authors: Tom Yan, Chicheng Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | I EXPERIMENTS To supplement our theoretical minimax analysis in the main section, we examine the performance of three learning algorithms, E-VS bisection, VS-bisection and randomly query (a point), in averagecase settings by randomly generating learning instances. Experiment Setup: We consider five sizes for the hypothesis class ranging from 15 to 40. Given a particular hypothesis class size |H|, we generate 50 random learning instances by randomly generating the binary labels of hypotheses on examples x X, where the number of data points |X| is varied from 5 to 30. Given a learning instance, we consider setting (the underlying hypothesis) h to be every h H, and thus average the query complexity across random instances as well as across H. This is done to explore the average-case query complexity, where we do not focus on the query complexity of one particular h = h H (as was done in some of the worst-case analyses). We investigate two possible labeling strategies, with varying amounts of abstention p = 0.0, 0.15, 0.3, 0.45, 0.6. The first strategy is that given the underlying hypothesis h H, it abstains on labeling a point x with probability p, and outputs h (x) otherwise (w.p. 1 p). This labeling strategy may be viewed as one that abstains arbitrarily, and may compromise identifiability. This models the labeling strategy of a myopic labeler. The second strategy is a more careful, adaptive labeling strategy that always ensures identifiability. Given the underlying h , when x is queried, it computes the resultant E-VS if x was abstained upon. If abstention leads to non-identifiability, it labels x and returns h (x). Otherwise, it abstains with probability p and provides the label otherwise. This may be viewed as a more shrewd labeling strategy that always ensures identifiability, while using some abstention. Results: We plot results in Figure 3 and Figure 4, with Figure 3 corresponding to the first (random labeling) strategy and Figure 4 corresponding to the identifiable labeling strategy. |
| Researcher Affiliation | Academia | Tom Yan Machine Learning Department Carnegie Mellon University tyyan@cmu.edu Chicheng Zhang Department of Computer Science University of Arizona chichengz@cs.arizona.edu |
| Pseudocode | Yes | Algorithm 2 E-VS Bisection Algorithm and Algorithm 3 Bisection Point Search Sub-routine are present on page 4. |
| Open Source Code | No | No explicit statement or link indicating the release of open-source code for the methodology described in the paper was found. |
| Open Datasets | No | The experiments use randomly generated learning instances and binary labels, not a publicly available or open dataset. |
| Dataset Splits | No | The paper describes experiments on randomly generated instances, but does not provide specific train/validation/test splits, cross-validation details, or predefined splits. |
| Hardware Specification | No | The paper describes the experiment setup (e.g., number of instances, hypothesis class sizes) but does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) used in the experiments. |
| Experiment Setup | Yes | Experiment Setup: We consider five sizes for the hypothesis class ranging from 15 to 40. Given a particular hypothesis class size |H|, we generate 50 random learning instances by randomly generating the binary labels of hypotheses on examples x X, where the number of data points |X| is varied from 5 to 30. Given a learning instance, we consider setting (the underlying hypothesis) h to be every h H, and thus average the query complexity across random instances as well as across H. This is done to explore the average-case query complexity, where we do not focus on the query complexity of one particular h = h H (as was done in some of the worst-case analyses). We investigate two possible labeling strategies, with varying amounts of abstention p = 0.0, 0.15, 0.3, 0.45, 0.6. |