reproducibilityindex.ai

Independence-aware Advantage Estimation

Authors: Pushi Zhang, Li Zhao, Guoqing Liu, Jiang Bian, Minlie Huang, Tao Qin, Tie-Yan Liu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our method achieves higher sample efﬁciency compared with existing advantage estimation methods in complex environments. Empirically, we show that our estimated advantage function is closer to ground-truth advantage function Aπ than existing advantage estimation methods such as Monte-Carlo and Generalized Advantage Estimation [Schulman et al., 2015b]. We also test IAE advantage estimation in policy optimization settings on environments with high-dimensional observations, showing that our method outperforms other advantage estimation methods in sample efﬁciency. Results of our experiments are reported in Section 7.
Researcher Affiliation	Collaboration	1Tsinghua University 2Microsoft Research Asia 3University of Science and Technology of China
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper mentions 'Finite-state MDPs' and 'Pixel Grid World environment' which are custom environments built by the authors, but does not provide concrete access information (link, DOI, specific repository, or formal citation to an established public dataset) for these environments as datasets.
Dataset Splits	No	The paper describes 'Finite-state MDP settings' and 'Pixel Grid World environment' but does not specify any training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined standard splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries).
Experiment Setup	Yes	For GAE, we use λ = 0.95. We train tabular reward decomposition for 10000 episodes. In the per-step punishment setting, the agent gets r = 0.03 reward in every step before reaching its goal, r = 1 reward when reaching its goal for the ﬁrst time, and r = 0 reward for every step after reaching its goal. In no punishment setting, the agent gets r = 1 reward when reaching its goal for the ﬁrst time, and gets r = 0 reward otherwise.