Program Guided Agent
Authors: Shao-Hua Sun, Te-Lin Wu, Joseph J. Lim
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a 2D Minecraft environment not only demonstrate that the proposed framework learns to reliably accomplish program instructions and achieves zero-shot generalization to more complex instructions but also verify the efficiency of the proposed modulation mechanism for learning the multitask policy. |
| Researcher Affiliation | Academia | Shao-Hua Sun, Te-Lin Wu, Joseph J. Lim University of Southern California {shaohuas,telinwu,limjj}@usc.edu |
| Pseudocode | Yes | Algorithm 1 Program Execution |
| Open Source Code | No | The paper does not provide a concrete link to its source code or explicitly state that its code is being released. |
| Open Datasets | No | The paper describes generating its own program sets and collecting natural language translations but does not provide concrete access (e.g., a URL, DOI, or specific citation for public access) to these datasets. |
| Dataset Splits | Yes | We sample 4,500 programs using our DSL and split them into 4,000 training programs (train) and 500 testing programs (test). To examine the framework s ability to generalize to more complex instructions, we generate 500 programs which are twice longer and contains more condition branches on average to construct a harder testing set (test-complex). |
| Hardware Specification | Yes | We train all our models on a single Nvidia Titan-X GPU, in a 40 core Ubuntu 16.04 Linux server. |
| Software Dependencies | No | The paper mentions "TensorFlow (Abadi et al., 2016)" and "Glo Ve Pennington et al. (2014) (50-D version)" but does not provide specific version numbers for the TensorFlow library or other key software components used for replication. |
| Experiment Setup | Yes | We use the following hyperparameters to train A2C agents for our model and all the end-to-end learning models: learning rate: 1 10 3, number of environment: 64, number of workers: 64, and number of update roll-out steps: 5. |