Learning to Interact With Learning Agents

Authors: Adish Singla, Hamed Hassani, Andreas Krause

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation Results Next, we evaluate the performance of the forecaster LIL via simulations, and compare against the following benchmarks: EXP3: using EXP3 algorithm (Auer et al. 2002) as the forecaster for the specification in Protocol 1. ALL-LEARN: using EXP3 algorithm (Auer et al. 2002) as the forecaster for a relaxed/easier setting in which all experts j [N] observe the feedback at any time t. Adversarial losses. As our first simulation setting, we consider the same set up used in the proof of Theorem 1 and we use the loss sequence shown in Figure 1(a). For this loss sequence, the loss of actions A = {a1, a2, b} averaged over t [T] is given by (0.4583, 0.5, 0.7487) hence the best expert is EXP1 and the best action is a1 (cf. Equation 3). Figure 2(a) shows the regret REG(T, ALGO) for LIL, EXP3, and ALL-LEARN, and illustrates the following points. First, EXP3 suffer a linear regret, as dictated by the hardness result in Theorem 1. Second, LIL has a sub-linear regret as proved in Theorem 2.
Researcher Affiliation Academia Adish Singla MPI-SWS Saarbrücken, Germany adishs@mpi-sws.org Hamed Hassani University of Pennsylvania Philadelphia, USA hassani@seas.upenn.edu Andreas Krause ETH Zurich Zurich, Switzerland krausea@ethz.ch
Pseudocode Yes Algorithm 2: Forecaster LIL
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper discusses generating loss sequences for simulations (e.g., 'loss sequence shown in Figure 1(a)', 'losses of actions A = {a1, a2, b} are sampled i.i.d. from Bernoulli distributions') but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific details about train/validation/test dataset splits, sample counts, or cross-validation setups. It describes simulated loss sequences but no data partitioning.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud resources used for running its simulations or experiments.
Software Dependencies No The paper discusses algorithms and frameworks (e.g., EXP3, HEDGE, Online Mirror Descent) but does not provide specific software names with version numbers or dependencies needed to replicate the experiments.
Experiment Setup Yes Set parameters η = 1 β 2 β (log N)( 1 2 1{β=0}) . Then, for sufficiently large T, the worst-case expected cumulative regret of the forecaster LIL is: REG(T, LIL) O T 1 2 β N 1 2 β (log N)( 1.