State-free Reinforcement Learning

Authors: Mingyu Chen, Aldo Pacchiano, Xuezhou Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we study the state-free RL problem, where the algorithm does not have the states information before interacting with the environment. Specifically, denote the reachable state set by SΠ := {s| maxπ Π q P,π(s) > 0}, we design an algorithm which requires no information on the state space S while having a regret that is completely independent of S and only depend on SΠ. We view this as a concrete first step towards parameter-free RL, with the goal of designing RL algorithms that require no hyper-parameter tuning.
Researcher Affiliation Academia Mingyu Chen Boston University mingyuc@bu.edu Aldo Pacchiano Boston University Broad Institute of MIT and Harvard pacchian@bu.edu Xuezhou Zhang Boston University xuezhouz@bu.edu
Pseudocode Yes Algorithm 1 Black-box Reduction for State-free RL (SF-RL)
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets No This is a theoretical paper focused on algorithm design and regret analysis for MDPs. It does not use real-world datasets for training or experimentation, and therefore no public dataset information is provided.
Dataset Splits No This is a theoretical paper focused on algorithm design and regret analysis for MDPs. It does not conduct empirical experiments with data, and thus no validation splits are mentioned.
Hardware Specification No This is a theoretical paper and does not involve empirical experiments. Therefore, no hardware specifications are mentioned for running experiments.
Software Dependencies No This is a theoretical paper and does not involve empirical experiments. Therefore, no specific software dependencies with version numbers are mentioned for replicating experimental setups.
Experiment Setup No This is a theoretical paper and does not involve empirical experiments. Therefore, no specific experimental setup details, such as hyperparameters or training settings, are provided.