Online Robust Policy Learning in the Presence of Unknown Adversaries
Authors: Aaron Havens, Zhanhong Jiang, Soumik Sarkar
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that the proposed algorithm enables policy learning with significantly lower bias as compared to the state-of-the-art policy learning approaches even in the presence of heavy state information attacks. We present algorithm analysis and simulation results using popular Open AI Gym environments. |
| Researcher Affiliation | Academia | Aaron J. Havens, Zhanhong Jiang, Soumik Sarkar Department of Mechanical Engineering Iowa State University Ames, IA 50011 {ajhavens,zhjiang,soumiks}@iastate.edu |
| Pseudocode | Yes | Algorithm 1: MLAH Input :πnom and πadv sub-policies parameterized by θnom and θadv; Master policy πmaster with parameter vector φ. 1 Initialize θnom, θadv, φ 2 for pre-training iterations [optional] do 3 Train πnom and θnom on only nominal experiences. 4 end 5 for learning life-time do 6 for Time steps t to t + T do 7 Compute At over sub-policies (see eq. 4) 8 πmaster selects to switch or stay with sub-policy based on At observations to take action 9 end 10 Estimate all AGAE for πnom, πadv over T 11 Estimate all AGAE for πmaster over T with respect to At observations 12 Optimize θnom based on experiences collected from πnom 13 Optimize θadv based on experiences collected from πadv 14 Optimize φ based on all experiences with respect to At observations |
| Open Source Code | Yes | The source code is available on https://github.com/Aaron Havens/safe_rl. |
| Open Datasets | Yes | We present algorithm analysis and simulation results using popular Open AI Gym environments. |
| Dataset Splits | No | The paper mentions 'training' and 'evaluation' but does not provide specific details on dataset splits (e.g., percentages or counts for train/validation/test sets) or cross-validation setup beyond general usage of 'Open AI Gym environments'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processing power) used for running the experiments, only mentioning 'simulation results using popular Open AI Gym environments'. |
| Software Dependencies | No | The paper mentions using PPO [17] and Open AI Gym environments [14], but it does not specify version numbers for any software dependencies like Python, PyTorch/TensorFlow, or specific library versions, which are necessary for full reproducibility. |
| Experiment Setup | No | The paper states, 'For page limit constraints, PPO parameters used in experiments such as deep network size and actor-batches can be found in the supplementary material,' meaning specific hyperparameters are not provided in the main text. |