reproducibilityindex.ai

Position: Automatic Environment Shaping is the Next Frontier in RL

Authors: Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Table 1. Impact of environment shaping on policy optimization. Removing task-specific design choices in the reward, action space, state space, early termination, or initialization incurs performance reductions. and Table 2. Evaluating (Ma et al., 2023) for reward shaping, shaping different components, and coupled shaping.
Researcher Affiliation	Academia	1Improbable AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode	Yes	Figure 3. Action space shaping: (Top) Original shaped action space with task-specific features. (Bottom) Unshaped action space consisting of joint torque commands. and Figure 4. State space shaping: (Top) Original shaped state space with task-specific features. (Bottom) Unshaped state space contains the entire raw simulator state.
Open Source Code	Yes	To facilitate environment shaping research, our code exposes an API for modifying the environment code, which allows the optimizer to transform the reward, observation space, action space, etc. by editing Python functions at runtime. The API is designed so that any language models can be easily integrated to perform such transformations. Our implementation also facilitates faster evaluation of multiple environment shaping choices by training multiple policies in a single process, leveraging parallel simulation. 1 https://auto-env-shaping.github.io/
Open Datasets	Yes	We consider a case study of the Isaac Gym Envs task suite (Makoviychuk et al., 2021).
Dataset Splits	No	The paper defines 'Test Environment' and 'Reference Environment' conceptually and mentions 'the performance of all policies is evaluated in a fully unshaped environment', but it does not provide specific numerical train/validation/test dataset splits or their percentages for the empirical evaluations presented.
Hardware Specification	No	The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any exact GPU or CPU models, memory amounts, or other detailed hardware specifications used for experiments.
Software Dependencies	No	The paper mentions using 'Isaac Gym Envs (Makoviychuk et al., 2021)', 'PPO (Schulman et al., 2017)', and 'GPT-4 model' but does not provide specific version numbers for these or other software libraries or dependencies.
Experiment Setup	No	The paper states that 'off-the-shelf RL algorithm implementations are utilized with their default configurations' in the context of Isaac Gym Envs, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations for the experiments presented in the paper.