Reinforcement Learning with Lookahead Information

Authors: Nadav Merlis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.
Researcher Affiliation Academia Nadav Merlis Fair Play Joint Team, CREST, ENSAE Paris nadav.merlis@ensae.fr
Pseudocode Yes Algorithm 1 Monotonic Value Propagation with Reward Lookahead (MVP-RL) and Algorithm 2 Monotonic Value Propagation with Transition Lookahead (MVP-TL).
Open Source Code No The paper does not include experiments requiring code.
Open Datasets No The paper does not include experiments. It analyzes theoretical properties of algorithms under an 'episodic tabular Markov Decision Process model' rather than using empirical datasets.
Dataset Splits No The paper does not include experiments, and therefore no dataset splits for training, validation, or testing are specified.
Hardware Specification No The paper does not include experiments, and therefore no hardware specifications are provided.
Software Dependencies No The paper does not include experiments, and therefore no specific software dependencies with version numbers are listed.
Experiment Setup No The paper does not include experiments, and therefore no specific experimental setup details like hyperparameters or training configurations are provided.