Uni[MASK]: Unified Inference in Sequential Decision Problems

Authors: Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that a single Uni[MASK] model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our Uni[MASK] models consistently outperform comparable single-task models. Our code is publicly available here. We test this framework in a Gridworld navigation task and a continuous control environment. In Figure 7, we report validation loss if the model is trained on one task (or multiple tasks) and evaluated on another task. In addition to Gridworld, we test our method in a partially observable, continuous-state and continuous-action environment, with a larger trajectory horizon (200 timesteps).
Researcher Affiliation Collaboration Micah Carroll1, Orr Paradise1, Jessy Lin1, Raluca Georgescu2, Mingfei Sun2, David Bignell2, Stephanie Milani3, Katja Hofmann2, Matthew Hausknecht2, Anca Dragan1, and Sam Devlin2 1UC Berkeley 2Microsoft Research
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It describes the model architecture and training regimes in prose and diagrams.
Open Source Code Yes Our code is publicly available here. Codebase for Uni[MASK]: Unified Inference in Sequential Decision Problems . https://github.com/micahcarroll/uni MASK, 2022. URL https://github.com/ micahcarroll/uni MASK.
Open Datasets Yes We design a fully observable 4 4 Gridworld in which the agent should move to a fixed goal location behind a locked door with the Mini Grid environment framework [8]. We adapt the Mujoco-physics Maze2D environment [16]... D4RL: datasets for deep data-driven reinforcement learning. Co RR, abs/2004.07219, 2020. URL https: //arxiv.org/abs/2004.07219.
Dataset Splits Yes We generate 1000 trajectories of 200 timesteps, of which 900 are used for testing and 100 for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only mentions environments and training conditions.
Software Dependencies No The paper mentions software environments like "Mini Grid environment framework [8]" and "Mujoco-physics Maze2D environment [16]" but does not specify version numbers for these or any other software dependencies (e.g., libraries, frameworks, programming languages).
Experiment Setup Yes We train Uni[MASK] models on training trajectories of sequence length T = 10... We generate 1000 trajectories of 200 timesteps... add noise to the actions with zero-mean and 0.5 variance (which are then clipped to have each dimension between 1,1). We train two sets of such models, for context lengths of 5 and 10... We report reward evaluation results for 1000 rollouts in the Maze environment with standard errors across 5 seeds in Table 1. See Appendix D for more model details, and Appendix F for experiments with an alternative instantiation of Uni[MASK] with a feedforward neural network architecture. For implementation details, see Appendix G.