Active Reasoning in an Open-World Environment
Authors: Manjie Xu, Guangyuan Jiang, Wei Liang, Chi Zhang, Yixin Zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate state-of-the-art Reinforcement Learning (RL) and multimodal question-answering models on Conan. Our observations highlight an intriguing dichotomy: while these cutting-edge models exhibit prowess in addressing low-level, short-term tasks, they struggle with multi-round environmental interactions and high-level abductive reasoning. |
| Researcher Affiliation | Academia | Manjie Xu 1, : manjietsu@bit.edu.cn Guangyuan Jiang 2 jgy@stu.pku.edu.cn Wei Liang 1, 3, liangwei@bit.edu.cn Chi Zhang 4, zhangchi@bigai.ai Yixin Zhu 2, yixin.zhu@pku.edu.cn 1 School of Computer Science & Technology, Beijing Institute of Technology 2 Institute for AI, Peking University 3 Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing, China 4 National Key Laboratory of General Artificial Intelligence, BIGAI |
| Pseudocode | No | The paper describes its methods textually and through diagrams (e.g., Figure 3 illustrating the detective pipeline), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://sites.google.com/view/conan-active-reasoning |
| Open Datasets | Yes | Conan produced a corpus comprising 100,000 questions. These were derived from 10,000 unique scenes, generated via the Crafter’s scene generator, with each scene stemming from a task executed by a vandal. This resulted in an average generation of 10 questions per scene. |
| Dataset Splits | Yes | Table A4: Dataset split and choice distribution. Intent: 71162 (Train), 9152 (Test), 8822 (Val) |
| Hardware Specification | Yes | All models are trained utilizing 8 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "The Stable Baselines3 library" and models like "BERT-Large" and "De BERTA", but it does not specify exact version numbers for these software dependencies (e.g., "Stable Baselines3 vX.Y" or "PyTorch 1.9"). |
| Experiment Setup | Yes | The explorer is trained using DQN, TRPO, and Recurrent PPO for 10^8 steps, with a buffer size of 10^7 and a batch size of 512. In the case of DQN, training is conducted with ϵ 0.96. Each episode is capped at a maximum of 500 steps for the explorer. |