Stochastic Shortest Path with Adversarially Changing Costs
Authors: Aviv Rosenberg, Yishay Mansour
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for K episodes, the cost function changes arbitrarily between episodes, and the transitions are unknown to the agent. We develop the first algorithms for adversarial SSPs and prove high probability regret bounds of squareroot K assuming all costs are strictly positive, and sub-linear regret in the general case. First, we formalize the adversarial SSP model and define the notion of learning and regret. Second, we establish an efficient implementation of OMD in the SSP model with known transitions and study the conditions under which it guarantees near-optimal K expected regret, showing that some modifications are necessary. |
| Researcher Affiliation | Collaboration | Aviv Rosenberg1 and Yishay Mansour1,2 1Tel Aviv University, Israel 2Google Research, Tel Aviv {avivros007,mansour.yishay}@gmail.com |
| Pseudocode | Yes | Pseudocode in Appendix C. We defer to the pseudocode in Appendix E. We defer to the full pseudocode in Appendix H |
| Open Source Code | No | The paper is theoretical and focuses on algorithm design and proofs of regret bounds. There is no statement or link indicating that the source code for the described methods is publicly available or will be released. |
| Open Datasets | No | This paper is theoretical, focusing on algorithm design and mathematical proofs, and therefore does not use or reference any datasets. |
| Dataset Splits | No | As a theoretical paper, it does not involve empirical experiments with data, and therefore does not discuss training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments or their setup, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs, without any mention of specific software or library dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not present empirical experiments, therefore it does not include details about an experimental setup, hyperparameters, or system-level training settings. |