Learning to Interact With Learning Agents
Authors: Adish Singla, Hamed Hassani, Andreas Krause
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation Results Next, we evaluate the performance of the forecaster LIL via simulations, and compare against the following benchmarks: EXP3: using EXP3 algorithm (Auer et al. 2002) as the forecaster for the specification in Protocol 1. ALL-LEARN: using EXP3 algorithm (Auer et al. 2002) as the forecaster for a relaxed/easier setting in which all experts j [N] observe the feedback at any time t. Adversarial losses. As our first simulation setting, we consider the same set up used in the proof of Theorem 1 and we use the loss sequence shown in Figure 1(a). For this loss sequence, the loss of actions A = {a1, a2, b} averaged over t [T] is given by (0.4583, 0.5, 0.7487) hence the best expert is EXP1 and the best action is a1 (cf. Equation 3). Figure 2(a) shows the regret REG(T, ALGO) for LIL, EXP3, and ALL-LEARN, and illustrates the following points. First, EXP3 suffer a linear regret, as dictated by the hardness result in Theorem 1. Second, LIL has a sub-linear regret as proved in Theorem 2. |
| Researcher Affiliation | Academia | Adish Singla MPI-SWS Saarbrücken, Germany adishs@mpi-sws.org Hamed Hassani University of Pennsylvania Philadelphia, USA hassani@seas.upenn.edu Andreas Krause ETH Zurich Zurich, Switzerland krausea@ethz.ch |
| Pseudocode | Yes | Algorithm 2: Forecaster LIL |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper discusses generating loss sequences for simulations (e.g., 'loss sequence shown in Figure 1(a)', 'losses of actions A = {a1, a2, b} are sampled i.i.d. from Bernoulli distributions') but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific details about train/validation/test dataset splits, sample counts, or cross-validation setups. It describes simulated loss sequences but no data partitioning. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud resources used for running its simulations or experiments. |
| Software Dependencies | No | The paper discusses algorithms and frameworks (e.g., EXP3, HEDGE, Online Mirror Descent) but does not provide specific software names with version numbers or dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | Set parameters η = 1 β 2 β (log N)( 1 2 1{β=0}) . Then, for sufficiently large T, the worst-case expected cumulative regret of the forecaster LIL is: REG(T, LIL) O T 1 2 β N 1 2 β (log N)( 1. |