reproducibilityindex.ai

Learning to Play With Intrinsically-Motivated, Self-Aware Agents

Authors: Nick Haber, Damian Mrowca, Stephanie Wang, Li F. Fei-Fei, Daniel L. Yamins

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that this policy causes the agent to explore novel and informative interactions with its environment, leading to the generation of a spectrum of complex behaviors, including ego-motion prediction, object attention, and object gathering. Moreover, the world-model that the agent learns supports improved performance on object dynamics prediction, detection, localization and recognition tasks. Taken together, our results are initial steps toward creating ﬂexible autonomous agents that self-supervise in realistic physical environments. 3 Experiments We randomly place the agent in a square 10 by 10 meter room, together with up to two other objects with which the agent can interact
Researcher Affiliation	Academia	Nick Haber1,2,3, , Damian Mrowca4, , Stephanie Wang4 , Li Fei-Fei4 , and Daniel L. K. Yamins1,4,5 Departments of Psychology1, Pediatrics2, Biomedical Data Science3, Computer Science4, and Wu Tsai Neurosciences Institute5, Stanford, CA 94305 {nhaber, mrowca}@stanford.edu
Pseudocode	No	No structured pseudocode or algorithm blocks are present. The paper describes the agent's architecture and processes in prose and diagrams like 'Figure 2: Intrinsically-motivated self-aware agent architecture.'
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets	No	The paper describes a custom simulated environment ('built in Unity 3D') and how data was collected within it ('We collect data with a random policy from sixteen simulation instances'). No concrete access information (link, DOI, citation) to a publicly available dataset is provided.
Dataset Splits	Yes	These data are split into train (16000 examples), validation (8000 examples), and test (8000 examples) sets.
Hardware Specification	No	The paper mentions the use of 'Unity 3D' and 'PhysX engine' for simulation, but does not specify any hardware details such as GPU or CPU models used for running experiments.
Software Dependencies	No	The paper mentions 'Unity 3D', 'PhysX engine', and 'Adam algorithm [25]' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Training is performed using 16 asynchronous simulation instances [30], with different seeds and objects. The scene is reinitialized periodically, with time of reset randomly chosen between 2^13 to 2^15 steps. Each simulation maintains a data buffer of 250 timesteps to ensure stable training [27]. For model updates two examples are randomly sampled from each of the 16 simulation buffers to form a batch of size 32. Gradient updates are performed using the Adam algorithm [25], with an initial learning rate of 0.0001. We randomly place the agent in a square 10 by 10 meter room, together with up to two other objects with which the agent can interact, setting the maximum interaction distance δ to 2 meters.