Stochastic Shortest Path with Adversarially Changing Costs

Authors: Aviv Rosenberg, Yishay Mansour

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for K episodes, the cost function changes arbitrarily between episodes, and the transitions are unknown to the agent. We develop the first algorithms for adversarial SSPs and prove high probability regret bounds of squareroot K assuming all costs are strictly positive, and sub-linear regret in the general case. First, we formalize the adversarial SSP model and define the notion of learning and regret. Second, we establish an efficient implementation of OMD in the SSP model with known transitions and study the conditions under which it guarantees near-optimal K expected regret, showing that some modifications are necessary.
Researcher Affiliation Collaboration Aviv Rosenberg1 and Yishay Mansour1,2 1Tel Aviv University, Israel 2Google Research, Tel Aviv {avivros007,mansour.yishay}@gmail.com
Pseudocode Yes Pseudocode in Appendix C. We defer to the pseudocode in Appendix E. We defer to the full pseudocode in Appendix H
Open Source Code No The paper is theoretical and focuses on algorithm design and proofs of regret bounds. There is no statement or link indicating that the source code for the described methods is publicly available or will be released.
Open Datasets No This paper is theoretical, focusing on algorithm design and mathematical proofs, and therefore does not use or reference any datasets.
Dataset Splits No As a theoretical paper, it does not involve empirical experiments with data, and therefore does not discuss training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any computational experiments or their setup, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithm design and proofs, without any mention of specific software or library dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not present empirical experiments, therefore it does not include details about an experimental setup, hyperparameters, or system-level training settings.