Reinforcement Learning with Lookahead Information
Authors: Nadav Merlis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information. |
| Researcher Affiliation | Academia | Nadav Merlis Fair Play Joint Team, CREST, ENSAE Paris nadav.merlis@ensae.fr |
| Pseudocode | Yes | Algorithm 1 Monotonic Value Propagation with Reward Lookahead (MVP-RL) and Algorithm 2 Monotonic Value Propagation with Transition Lookahead (MVP-TL). |
| Open Source Code | No | The paper does not include experiments requiring code. |
| Open Datasets | No | The paper does not include experiments. It analyzes theoretical properties of algorithms under an 'episodic tabular Markov Decision Process model' rather than using empirical datasets. |
| Dataset Splits | No | The paper does not include experiments, and therefore no dataset splits for training, validation, or testing are specified. |
| Hardware Specification | No | The paper does not include experiments, and therefore no hardware specifications are provided. |
| Software Dependencies | No | The paper does not include experiments, and therefore no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper does not include experiments, and therefore no specific experimental setup details like hyperparameters or training configurations are provided. |