reproducibilityindex.ai

Information-based learning by agents in unbounded state spaces

Authors: Shariq A Mobin, James A Arnemann, Fritz Sommer

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we describe simulation experiments with our two models, CRP-PIG and EB-CRP-PIG, and compare them with published approaches. The models are tested in environments deﬁned in the literature and also in an unbounded world. First the agents were tested in a bounded maze environment taken from [12] (Figure 2). Figure 3 depicts the missing information (11) in the bounded maze for the various learning strategies over 3000 sampling steps averaged over 200 runs.
Researcher Affiliation	Academia	Shariq A. Mobin, James A. Arnemann, Friedrich T. Sommer Redwood Center for Theoretical Neuroscience University of California, Berkeley Berkeley, CA 94720 shariqmobin@berkeley.edu, arnemann@berkeley.edu, fsommer@berkeley.edu
Pseudocode	No	The paper contains mathematical formulations and descriptions of the proposed models and processes, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets	Yes	First the agents were tested in a bounded maze environment taken from [12] (Figure 2). The state space in the maze consists of the \|S \| = 36 rooms. To directly assess how efﬁcient learning translates to the ability to harvest reward, we consider the 5state Chain problem [19], shown in Figure 4, a popular benchmark problem.
Dataset Splits	No	The paper mentions running simulations over a certain number of steps and averaging over runs (e.g., "3000 sampling steps averaged over 200 runs", "1000 steps, averaged over 500 runs"), but it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the environments used.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing instance types.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers/libraries).
Experiment Setup	Yes	With the discount factor, λ, set to 0.95, one can deﬁne how actions are chosen by all our PIG agents. We found S=120 to be roughly optimal for our agent and display the results of the experiment in Table 1, taking the results of the competitor algorithms directly from the corresponding papers. Speciﬁcally, we assign a reward to each state equal to the Euclidian distances from the starting state. Like for the Chain problem before, we create two agents EB-CRP-PIG-R and LTA-R which each run for 1000 total steps, exploring for S=750 steps (deﬁned previously) and then calculating their best reward policy and executing it for the remaining 250 steps.