reproducibilityindex.ai

How Does Goal Relabeling Improve Sample Efficiency?

Authors: Sirui Zheng, Chenjia Bai, Zhuoran Yang, Zhaoran Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	To this end, we construct an example to show the information-theoretical improvement in sample efﬁciency achieved by goal relabeling. Our example reveals that goal relabeling can enhance sample efﬁciency and exploit the rich information in observations through better hypothesis elimination. Based on these insights, we develop an RL algorithm called GOALIVE. To analyze the sample complexity of GOALIVE, we introduce a complexity measure, the goalconditioned Bellman-Eluder (GOAL-BE) dimension, which characterizes the sample complexity of goal-conditioned RL problems. Compared to the Bellman-Eluder dimension, the goalconditioned version offers an exponential improvement in the best case. To the best of our knowledge, our work provides the ﬁrst characterization of the theoretical improvement in sample efﬁciency achieved by goal relabeling.
Researcher Affiliation	Collaboration	1 Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL 60208, USA 2Shanghai Artiﬁcial Intelligence Laboratory 3 Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA..
Pseudocode	Yes	Algorithm 1 GOAl-conditioned optimism Led Iterative Value function Elimination (GOALIVE)
Open Source Code	No	The paper does not include any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	No	The paper is theoretical and uses a constructed example (episodic MDP) for analysis rather than real-world datasets, hence no public dataset access information is provided.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments with data splits.
Hardware Specification	No	The paper is theoretical and does not describe hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies or versions for experiments.
Experiment Setup	No	The paper is theoretical and does not describe empirical experimental setups, hyperparameters, or training configurations.