Pareto Policy Adaptation
Authors: Panagiotis Kyriakis, Jyotirmoy Deshmukh, Paul Bogdan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method in a series of reinforcement learning tasks. 5 EXPERIMENTAL EVALUATION In this section, we evaluate the performance of our proposed method. |
| Researcher Affiliation | Academia | Panagiotis Kyriakis University of Southern California Los Angeles, USA pkyriaki@usc.edu Jyotirmoy V. Deshmukh University of Southern California Los Angeles, USA jdeshmuk@usc.edu Paul Bogdan University of Southern California Los Angeles, USA pbogdan@usc.edu |
| Pseudocode | Yes | Algorithm 1: Multi-Objective Policy Gradient |
| Open Source Code | No | The paper does not contain an explicit statement about releasing its own source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Domains: We evaluate on 4 environments (details given in the Appendix): (a) Multi-Objective Grid World (MOWG): a variant of the classical gridworld, (b) Deep Sea Treasure (DST): a slightly modified version of classical multi-objective reinforcement learning enviroment (51), (c) Mulit Objective Super Mario (MOSM): modified, multi-objective variant of the popular video game that has a 5-dimensional reward signal and (d) Multi-Objective Mu Jo Co (MOMU): a modified version of the Mu Jo Co physics simulator, focusing on locomotion tasks. Our implementation uses modified versions of 4 Open AI Gym enviroments (Fig. 5). |
| Dataset Splits | No | The paper does not explicitly specify exact percentages or sample counts for training, validation, and test dataset splits for reproducibility. It describes environments and training procedures, but not formal data splits. |
| Hardware Specification | Yes | We run all of our simulations in the Google Cloud Platorm using 48 v Cores and one NVIDIA Tesla T4 GPU. |
| Software Dependencies | No | The paper mentions “Py Torch” and “torch-ac package” and “Open AI Gym” but does not specify their version numbers or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We set the GAE parameter to λ = 0.95 and the discount factor to γ = 0.99. We use the Adam optimizer (β1 = 0.9, β2 = 0.999) with a learning rate of 0.0001. For each 512 frames we perform one update of the network parameters iterating over 10 epochs of the collected data and using a mini-batch size of 64. |