Learning Thresholds with Latent Values and Censored Feedback

Authors: Jiahao Zhang, Tao Lin, Weiqiang Zheng, Zhe Feng, Yifeng Teng, Xiaotie Deng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide some simple experiments to verify our theoretical results. We consider two toy examples. The first example is g(γ, v) = γ if γ < 1/3 otherwise g(γ, v) = v and the value distribution is the uniform distribution on [0, 1], which corresponds to the monotone reward function and Lipschitz value distribution. The second example is g(γ, v) = γ and the value distribution is a point distribution that all the mass is on v = 1/3, which corresponds to the Lipschitz reward function and general distribution case (Theorem 4.2). In our experiment, we first fix the number of queries to be K3 where K = 100 + 3i for all integer 1 i 33, i.e. we choose K from [100, 200]. Therefore, the algorithm for the upper bound should output errors smaller than the predetermined loss 1/K . The following figure shows the relationship between the loss and the number of queries. The empirical loss curve is under the predetermined loss curve, which verifies our upper bound results. In this section, we provide experimental results to verify our lower bound result (Theorem 4.3). We consider the example we provided in the proof of Theorem 4.3 where g(γ, v) = γ and the value distribution is a hard distribution (see the left part of Fig. 3). For ε { 1/400, 1/500, 1/600}, we run the algorithm in Theorem 4.1 to determine the minimum number n of queries that are necessary to learn the optimal threshold with ε additive error. Due to randomness, we repeat 10 times for each ε. At each round, we compute ln n /ε and find it converging to 3, which verifies our lower bound result. Figure 2: Loss curves under different examples. The orange line: the predetermined loss curve. The blue line: the empirical loss curve. All variables are in logarithmic form. Figure 3: Left: The blue curve is the probability distribution function. The orange part is the frequency of realized samples. Right: The box plot when ε { 1/400, 1/500, 1/600}. The horizontal axe represents ε. The vertical axe represents the logarithmic ratio ln n /ε.
Researcher Affiliation Collaboration Jiahao Zhang1,2, Tao Lin3, Weiqiang Zheng4, Zhe Feng5, Yifeng Teng5, Xiaotie Deng1,2 1School of Electronics Engineering and Computer Science, Peking University 2CFCS, Peking University 3Harvard University 4Yale University 5Google Research
Pseudocode No The paper describes theoretical algorithms and proofs, but it does not include any explicitly labeled pseudocode blocks or algorithms formatted as such.
Open Source Code No The paper does not contain any statements or links indicating that source code for the methodology is openly available.
Open Datasets No The paper uses 'toy examples' and constructs specific 'value distributions' for its experiments to verify theoretical results, but it does not use or provide access to any established public datasets or external data repositories.
Dataset Splits No The paper describes experimental setups to verify theoretical bounds, but it does not specify data splits such as training, validation, or test sets in the typical machine learning sense.
Hardware Specification No The paper describes experiments to verify theoretical results but does not provide any details regarding the specific hardware (e.g., CPU, GPU models, memory) used for these experiments.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers used in the experiments.
Experiment Setup Yes In our experiment, we first fix the number of queries to be K3 where K = 100 + 3i for all integer 1 i 33, i.e. we choose K from [100, 200]. For ε { 1/400, 1/500, 1/600}, we run the algorithm in Theorem 4.1 to determine the minimum number n of queries that are necessary to learn the optimal threshold with ε additive error. Due to randomness, we repeat 10 times for each ε.