On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning
Authors: Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks. |
| Researcher Affiliation | Collaboration | Guy Tennenholtz Nvidia Research Technion Assaf Hallak Nvidia Research Gal Dalal Nvidia Research Shie Mannor Nvidia Research Technion Gal Chechik Nvidia Research Uri Shalit Technion |
| Pseudocode | Yes | Algorithm 1 RL using Expert Data with Unobserved Confounders (Follow the Leader) |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the source code for the methodology described. |
| Open Datasets | Yes | Our experiments were based off of the recently proposed assistive-gym (Erickson et al., 2020) and recsim (Ie et al., 2019) environments. |
| Dataset Splits | No | The paper discusses expert data and online environment data but does not specify training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper mentions 'Num workers 40' in Table 2 but does not provide specific hardware details such as GPU/CPU models or types of machines used for experiments. |
| Software Dependencies | No | We used PPO (Schulman et al., 2017) implemented in RLlib (Liang et al., 2018) for both the imitation as well as RL settings. (No specific version numbers are provided for these software dependencies.) |
| Experiment Setup | Yes | Table 2: Hyper-parameters used to train the PPO agent. |