reproducibilityindex.ai

Zero-Shot Reward Specification via Grounded Natural Language

Authors: Parsa Mahmoudieh, Deepak Pathak, Trevor Darrell

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 Experiments In this section, we evaluate our full Zero-shot Reward Model on pushing, picking, and placing manipulation tasks performed in a planar setup. We train each task using our full zero-shot reward model output as reward for the PPO reinforcement learning algorithm (Schulman et al., 2017). We then train for the same tasks with other types of reward functions as baselines or privileged methods for comparison: a) Oracle reward (privileged): ... b) VICE (privileged): ... c) Ours-base: ... d) Curiosity-RL: ...
Researcher Affiliation	Academia	1UC Berkeley 2Carnegie Mellon University. Correspondence to: Parsa Mahmoudieh <parsa.m@berkeley.edu>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper describes generating its own datasets ('randomly collected images', 'large dataset of the rollouts of those polices') but does not provide any access information or explicitly state their public availability.
Dataset Splits	No	The paper does not explicitly provide specific training, validation, and test dataset splits with percentages or sample counts for its experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components like PyTorch and Adam optimizer but does not provide specific version numbers for these or other libraries used.
Experiment Setup	Yes	The policy is trained using Adam optimizer with AMS grad with learning rate of 1e-4. The images are augmented with Py Torch Random Resized Crop of 0.95 to 1.0 area and 0.98 to 1.02 aspect ratio randomization and resized to original image dimensions of 128x128. All the policies are trained for 300 epochs.