Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

Authors: Gugan Thoppe, Prashanth L A, Sanjay P. Bhat

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we are concerned with the problem of estimating a risk measure from a sample path of a discounted Markov Cost Process (MCP). In the context of RL, this is equivalent to the policy evaluation problem, albeit for a risk measure. For this problem, we derive minimax sample complexity lower bounds as well as upper bounds.
Researcher Affiliation Collaboration 1 Dept. of Computer Science and Automation, Indian Institute of Science (IISc), Bengaluru, India; Robert Bosch Centre for Data Science and Artificial Intelligence, IIT Madras, Chennai, India. 2 Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India. 3 TCS Research, Hyderabad, India.
Pseudocode No The paper describes algorithms and estimation schemes in prose but does not include structured pseudocode blocks or sections explicitly labeled 'Algorithm'.
Open Source Code No The paper does not contain any statement regarding the release of source code or a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and focuses on deriving lower and upper bounds for risk estimation; it does not use or specify any public datasets for training or experimentation.
Dataset Splits No The paper is theoretical and does not conduct experiments with dataset splits, thus no information on training, validation, or test splits is provided.
Hardware Specification No The paper is theoretical and does not describe computational experiments, therefore no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe computational experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical derivations and proofs of bounds, thus it does not include details on experimental setup, hyperparameters, or training configurations.