Position: Automatic Environment Shaping is the Next Frontier in RL

Authors: Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1. Impact of environment shaping on policy optimization. Removing task-specific design choices in the reward, action space, state space, early termination, or initialization incurs performance reductions. and Table 2. Evaluating (Ma et al., 2023) for reward shaping, shaping different components, and coupled shaping.
Researcher Affiliation Academia 1Improbable AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode Yes Figure 3. Action space shaping: (Top) Original shaped action space with task-specific features. (Bottom) Unshaped action space consisting of joint torque commands. and Figure 4. State space shaping: (Top) Original shaped state space with task-specific features. (Bottom) Unshaped state space contains the entire raw simulator state.
Open Source Code Yes To facilitate environment shaping research, our code exposes an API for modifying the environment code, which allows the optimizer to transform the reward, observation space, action space, etc. by editing Python functions at runtime. The API is designed so that any language models can be easily integrated to perform such transformations. Our implementation also facilitates faster evaluation of multiple environment shaping choices by training multiple policies in a single process, leveraging parallel simulation. 1 https://auto-env-shaping.github.io/
Open Datasets Yes We consider a case study of the Isaac Gym Envs task suite (Makoviychuk et al., 2021).
Dataset Splits No The paper defines 'Test Environment' and 'Reference Environment' conceptually and mentions 'the performance of all policies is evaluated in a fully unshaped environment', but it does not provide specific numerical train/validation/test dataset splits or their percentages for the empirical evaluations presented.
Hardware Specification No The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any exact GPU or CPU models, memory amounts, or other detailed hardware specifications used for experiments.
Software Dependencies No The paper mentions using 'Isaac Gym Envs (Makoviychuk et al., 2021)', 'PPO (Schulman et al., 2017)', and 'GPT-4 model' but does not provide specific version numbers for these or other software libraries or dependencies.
Experiment Setup No The paper states that 'off-the-shelf RL algorithm implementations are utilized with their default configurations' in the context of Isaac Gym Envs, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for the experiments presented in the paper.