reproducibilityindex.ai

Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs

Authors: Haotian Fu, Jiayu Yao, Omer Gottesman, Finale Doshi-Velez, George Konidaris

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to theoretical analysis, we empirically study the performance of model and policy transfer algorithms in two continuous control domains. For each domain, we control the Lipschitz constants of the Hi P-MDPs by altering the hyper-parameters of the environments. The results are consistent with our theoretical understanding of how the regrets of model and policy transfer algorithms scale with respect to the estimation error of the hidden parameters: a slower increase in Lipschitz constant with respect to the hidden parameter implies a smaller performance decay.
Researcher Affiliation	Academia	Haotian Fu1, Jiayu Yao2, Omer Gottesman1, Finale Doshi-Velez2 & George Konidaris1 1Brown University, 2Harvard University
Pseudocode	Yes	Algorithm 1: Policy transfer; Algorithm 2: Model transfer
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the methodology described in the paper is openly available.
Open Datasets	No	The paper describes custom-designed environments (ball-goal, ball-wind) with detailed transition and reward functions in Appendix B. However, it does not provide any specific link, DOI, repository, or formal citation for these environments or any datasets used to be publicly accessed.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages, sample counts, or specific splitting methodologies.
Hardware Specification	No	The paper states it 'was conducted using computational resources and services at the Center for Computation and Visualization, Brown University' but does not specify any particular hardware components like GPU models, CPU models, or memory.
Software Dependencies	No	The paper mentions using PEARL, CaDM, and SAC algorithms, but it does not specify version numbers for these or any other software dependencies (e.g., Python, PyTorch).
Experiment Setup	No	The paper describes the general process of context encoder updates and interaction with the environment (e.g., 'collect transitions by interacting with the environment for only one episode'), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed system-level training configurations needed for reproduction.