Reinforcement Learning with Trajectory Feedback

Authors: Yonathan Efroni, Nadav Merlis, Shie Mannor7288-7295

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The paper focuses on extending reinforcement learning algorithms to a new feedback setting, analyzing their regret, and establishing performance guarantees and computational tractability. It includes theorems, lemmas, and pseudocode for algorithms (Algorithms 1, 2, 3), but no empirical evaluation on datasets, performance metrics, or experimental results from actual runs.
Researcher Affiliation Collaboration Yonathan Efroni 1,2, Nadav Merlis 1 and Shie Mannor1,3 1Technion, Israel Institute of Technology 2Microsoft Research, New York 3Nvidia Research, Israel
Pseudocode Yes Algorithm 1 OFUL for RL with Trajectory Feedback and Known Model; Algorithm 2 TS for RL with Trajectory Feedback and Known Model; Algorithm 3 UCBVI-TS for RL with Trajectory Feedback
Open Source Code No The paper does not provide any statements about open-sourcing the code, nor does it include links to a code repository.
Open Datasets No The paper is theoretical and does not conduct experiments on a specific dataset, thus no public dataset information is provided.
Dataset Splits No The paper is theoretical and does not involve empirical validation on datasets, thus no dataset split information is provided.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis; it does not list any software dependencies with specific version numbers required for reproducing empirical results.
Experiment Setup No The paper describes theoretical algorithms and their analysis (e.g., regret bounds); it does not include details about an empirical experimental setup such as hyperparameters or training configurations.