On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

Authors: Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
Researcher Affiliation Collaboration Guy Tennenholtz Nvidia Research Technion Assaf Hallak Nvidia Research Gal Dalal Nvidia Research Shie Mannor Nvidia Research Technion Gal Chechik Nvidia Research Uri Shalit Technion
Pseudocode Yes Algorithm 1 RL using Expert Data with Unobserved Confounders (Follow the Leader)
Open Source Code No The paper does not contain an explicit statement or link providing access to the source code for the methodology described.
Open Datasets Yes Our experiments were based off of the recently proposed assistive-gym (Erickson et al., 2020) and recsim (Ie et al., 2019) environments.
Dataset Splits No The paper discusses expert data and online environment data but does not specify training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification No The paper mentions 'Num workers 40' in Table 2 but does not provide specific hardware details such as GPU/CPU models or types of machines used for experiments.
Software Dependencies No We used PPO (Schulman et al., 2017) implemented in RLlib (Liang et al., 2018) for both the imitation as well as RL settings. (No specific version numbers are provided for these software dependencies.)
Experiment Setup Yes Table 2: Hyper-parameters used to train the PPO agent.