reproducibilityindex.ai

Online Convex Optimization in Adversarial Markov Decision Processes

Authors: Aviv Rosenberg, Yishay Mansour

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show O(L\|X\| p \|A\|T) regret bound, where T is the number of episodes, X is the state space, A is the action space, and L is the length of each episode. Our online algorithm is implemented using entropic regularization methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode) , as well as improve previous regret bounds.
Researcher Affiliation	Collaboration	1Tel Aviv University, Israel 2Google Research, Tel Aviv, Israel. Correspondence to: Aviv Rosenberg <avivros007@gmail.com>, Yishay Mansour <mansour.yishay@gmail.com>.
Pseudocode	Yes	Algorithm 1 Learner-Environment Interaction Algorithm 2 UC-O-REPS Algorithm Algorithm 3 Comp-Policy Procedure
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not describe empirical experiments with datasets. Therefore, no information on public datasets for training is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with datasets. Therefore, no information on dataset splits for validation is provided.
Hardware Specification	No	The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications for running experiments are provided.
Software Dependencies	No	The paper is theoretical and discusses algorithms conceptually, not in terms of specific software implementations with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments. Therefore, no specific experimental setup details like hyperparameters or training configurations are provided.