Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs
Authors: Haotian Fu, Jiayu Yao, Omer Gottesman, Finale Doshi-Velez, George Konidaris
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to theoretical analysis, we empirically study the performance of model and policy transfer algorithms in two continuous control domains. For each domain, we control the Lipschitz constants of the Hi P-MDPs by altering the hyper-parameters of the environments. The results are consistent with our theoretical understanding of how the regrets of model and policy transfer algorithms scale with respect to the estimation error of the hidden parameters: a slower increase in Lipschitz constant with respect to the hidden parameter implies a smaller performance decay. |
| Researcher Affiliation | Academia | Haotian Fu1, Jiayu Yao2, Omer Gottesman1, Finale Doshi-Velez2 & George Konidaris1 1Brown University, 2Harvard University |
| Pseudocode | Yes | Algorithm 1: Policy transfer; Algorithm 2: Model transfer |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the methodology described in the paper is openly available. |
| Open Datasets | No | The paper describes custom-designed environments (ball-goal, ball-wind) with detailed transition and reward functions in Appendix B. However, it does not provide any specific link, DOI, repository, or formal citation for these environments or any datasets used to be publicly accessed. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages, sample counts, or specific splitting methodologies. |
| Hardware Specification | No | The paper states it 'was conducted using computational resources and services at the Center for Computation and Visualization, Brown University' but does not specify any particular hardware components like GPU models, CPU models, or memory. |
| Software Dependencies | No | The paper mentions using PEARL, CaDM, and SAC algorithms, but it does not specify version numbers for these or any other software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | No | The paper describes the general process of context encoder updates and interaction with the environment (e.g., 'collect transitions by interacting with the environment for only one episode'), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed system-level training configurations needed for reproduction. |