Information-based learning by agents in unbounded state spaces

Authors: Shariq A Mobin, James A Arnemann, Fritz Sommer

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we describe simulation experiments with our two models, CRP-PIG and EB-CRP-PIG, and compare them with published approaches. The models are tested in environments defined in the literature and also in an unbounded world. First the agents were tested in a bounded maze environment taken from [12] (Figure 2). Figure 3 depicts the missing information (11) in the bounded maze for the various learning strategies over 3000 sampling steps averaged over 200 runs.
Researcher Affiliation Academia Shariq A. Mobin, James A. Arnemann, Friedrich T. Sommer Redwood Center for Theoretical Neuroscience University of California, Berkeley Berkeley, CA 94720 shariqmobin@berkeley.edu, arnemann@berkeley.edu, fsommer@berkeley.edu
Pseudocode No The paper contains mathematical formulations and descriptions of the proposed models and processes, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets Yes First the agents were tested in a bounded maze environment taken from [12] (Figure 2). The state space in the maze consists of the |S | = 36 rooms. To directly assess how efficient learning translates to the ability to harvest reward, we consider the 5state Chain problem [19], shown in Figure 4, a popular benchmark problem.
Dataset Splits No The paper mentions running simulations over a certain number of steps and averaging over runs (e.g., "3000 sampling steps averaged over 200 runs", "1000 steps, averaged over 500 runs"), but it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the environments used.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing instance types.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers/libraries).
Experiment Setup Yes With the discount factor, λ, set to 0.95, one can define how actions are chosen by all our PIG agents. We found S=120 to be roughly optimal for our agent and display the results of the experiment in Table 1, taking the results of the competitor algorithms directly from the corresponding papers. Specifically, we assign a reward to each state equal to the Euclidian distances from the starting state. Like for the Chain problem before, we create two agents EB-CRP-PIG-R and LTA-R which each run for 1000 total steps, exploring for S=750 steps (defined previously) and then calculating their best reward policy and executing it for the remaining 250 steps.