Learning to Play With Intrinsically-Motivated, Self-Aware Agents

Authors: Nick Haber, Damian Mrowca, Stephanie Wang, Li F. Fei-Fei, Daniel L. Yamins

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating flexible autonomous agents that self-supervise in realistic physical environments. 3 Experiments We randomly place the agent in a square 10 by 10 meter room, together with up to two other objects with which the agent can interact
Researcher Affiliation Academia Nick Haber1,2,3, , Damian Mrowca4, , Stephanie Wang4 , Li Fei-Fei4 , and Daniel L. K. Yamins1,4,5 Departments of Psychology1, Pediatrics2, Biomedical Data Science3, Computer Science4, and Wu Tsai Neurosciences Institute5, Stanford, CA 94305 {nhaber, mrowca}@stanford.edu
Pseudocode No No structured pseudocode or algorithm blocks are present. The paper describes the agent's architecture and processes in prose and diagrams like 'Figure 2: Intrinsically-motivated self-aware agent architecture.'
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No The paper describes a custom simulated environment ('built in Unity 3D') and how data was collected within it ('We collect data with a random policy from sixteen simulation instances'). No concrete access information (link, DOI, citation) to a publicly available dataset is provided.
Dataset Splits Yes These data are split into train (16000 examples), validation (8000 examples), and test (8000 examples) sets.
Hardware Specification No The paper mentions the use of 'Unity 3D' and 'PhysX engine' for simulation, but does not specify any hardware details such as GPU or CPU models used for running experiments.
Software Dependencies No The paper mentions 'Unity 3D', 'PhysX engine', and 'Adam algorithm [25]' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Training is performed using 16 asynchronous simulation instances [30], with different seeds and objects. The scene is reinitialized periodically, with time of reset randomly chosen between 2^13 to 2^15 steps. Each simulation maintains a data buffer of 250 timesteps to ensure stable training [27]. For model updates two examples are randomly sampled from each of the 16 simulation buffers to form a batch of size 32. Gradient updates are performed using the Adam algorithm [25], with an initial learning rate of 0.0001. We randomly place the agent in a square 10 by 10 meter room, together with up to two other objects with which the agent can interact, setting the maximum interaction distance δ to 2 meters.