reproducibilityindex.ai

Reinforcement Learning with Lookahead Information

Authors: Nadav Merlis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.
Researcher Affiliation	Academia	Nadav Merlis Fair Play Joint Team, CREST, ENSAE Paris nadav.merlis@ensae.fr
Pseudocode	Yes	Algorithm 1 Monotonic Value Propagation with Reward Lookahead (MVP-RL) and Algorithm 2 Monotonic Value Propagation with Transition Lookahead (MVP-TL).
Open Source Code	No	The paper does not include experiments requiring code.
Open Datasets	No	The paper does not include experiments. It analyzes theoretical properties of algorithms under an 'episodic tabular Markov Decision Process model' rather than using empirical datasets.
Dataset Splits	No	The paper does not include experiments, and therefore no dataset splits for training, validation, or testing are specified.
Hardware Specification	No	The paper does not include experiments, and therefore no hardware specifications are provided.
Software Dependencies	No	The paper does not include experiments, and therefore no specific software dependencies with version numbers are listed.
Experiment Setup	No	The paper does not include experiments, and therefore no specific experimental setup details like hyperparameters or training configurations are provided.