Zero-Shot Reward Specification via Grounded Natural Language

Authors: Parsa Mahmoudieh, Deepak Pathak, Trevor Darrell

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments In this section, we evaluate our full Zero-shot Reward Model on pushing, picking, and placing manipulation tasks performed in a planar setup. We train each task using our full zero-shot reward model output as reward for the PPO reinforcement learning algorithm (Schulman et al., 2017). We then train for the same tasks with other types of reward functions as baselines or privileged methods for comparison: a) Oracle reward (privileged): ... b) VICE (privileged): ... c) Ours-base: ... d) Curiosity-RL: ...
Researcher Affiliation Academia 1UC Berkeley 2Carnegie Mellon University. Correspondence to: Parsa Mahmoudieh <parsa.m@berkeley.edu>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets No The paper describes generating its own datasets ('randomly collected images', 'large dataset of the rollouts of those polices') but does not provide any access information or explicitly state their public availability.
Dataset Splits No The paper does not explicitly provide specific training, validation, and test dataset splits with percentages or sample counts for its experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software components like PyTorch and Adam optimizer but does not provide specific version numbers for these or other libraries used.
Experiment Setup Yes The policy is trained using Adam optimizer with AMS grad with learning rate of 1e-4. The images are augmented with Py Torch Random Resized Crop of 0.95 to 1.0 area and 0.98 to 1.02 aspect ratio randomization and resized to original image dimensions of 128x128. All the policies are trained for 300 epochs.