Position: Automatic Environment Shaping is the Next Frontier in RL
Authors: Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1. Impact of environment shaping on policy optimization. Removing task-specific design choices in the reward, action space, state space, early termination, or initialization incurs performance reductions. and Table 2. Evaluating (Ma et al., 2023) for reward shaping, shaping different components, and coupled shaping. |
| Researcher Affiliation | Academia | 1Improbable AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA. |
| Pseudocode | Yes | Figure 3. Action space shaping: (Top) Original shaped action space with task-specific features. (Bottom) Unshaped action space consisting of joint torque commands. and Figure 4. State space shaping: (Top) Original shaped state space with task-specific features. (Bottom) Unshaped state space contains the entire raw simulator state. |
| Open Source Code | Yes | To facilitate environment shaping research, our code exposes an API for modifying the environment code, which allows the optimizer to transform the reward, observation space, action space, etc. by editing Python functions at runtime. The API is designed so that any language models can be easily integrated to perform such transformations. Our implementation also facilitates faster evaluation of multiple environment shaping choices by training multiple policies in a single process, leveraging parallel simulation. 1 https://auto-env-shaping.github.io/ |
| Open Datasets | Yes | We consider a case study of the Isaac Gym Envs task suite (Makoviychuk et al., 2021). |
| Dataset Splits | No | The paper defines 'Test Environment' and 'Reference Environment' conceptually and mentions 'the performance of all policies is evaluated in a fully unshaped environment', but it does not provide specific numerical train/validation/test dataset splits or their percentages for the empirical evaluations presented. |
| Hardware Specification | No | The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any exact GPU or CPU models, memory amounts, or other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions using 'Isaac Gym Envs (Makoviychuk et al., 2021)', 'PPO (Schulman et al., 2017)', and 'GPT-4 model' but does not provide specific version numbers for these or other software libraries or dependencies. |
| Experiment Setup | No | The paper states that 'off-the-shelf RL algorithm implementations are utilized with their default configurations' in the context of Isaac Gym Envs, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for the experiments presented in the paper. |