Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

Authors: Wenhan Xiong, Xiaoxiao Guo, Mo Yu, Shiyu Chang, Bowen Zhou, William Yang Wang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our scheduled policy optimization method on the Blocks environment originally created by Bisk et al. [2016]. There are 20 unique blocks in the environment and the goal of the agent is to accomplish natural language described tasks by moving blocks in the 2D map. The dataset consists of 11,871 training samples and 1,179/3,177 samples for validation/testing.
Researcher Affiliation Collaboration Wenhan Xiong1, Xiaoxiao Guo2, Mo Yu2, Shiyu Chang2, Bowen Zhou3, William Yang Wang1, 1 University of California, Santa Barbara 2 IBM Research 3 JD AI Research
Pseudocode Yes Algorithm 1: Scheduled Policy Optimization Algorithm
Open Source Code Yes Code and trained models can be found at https://github. com/xwhan/walk_the_blocks.
Open Datasets Yes We evaluate our scheduled policy optimization method on the Blocks environment originally created by Bisk et al. [2016].
Dataset Splits Yes The dataset consists of 11,871 training samples and 1,179/3,177 samples for validation/testing.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions 'Adam optimizer' and 'PPO' but does not specify version numbers for any software libraries or dependencies, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes The initial learning rate is 0.0001 and is divided by 2 for every 4 epochs. The windowed history consists of the execution errors of the last 100 trials. The clipping interval of PPO is set to [0.95, 1.05] and the number of PPO epochs for each update step is set to be 4. We restrict the number of training epochs to be less than 20. Early-stopping is applied using the Dev set.