Information-based learning by agents in unbounded state spaces
Authors: Shariq A Mobin, James A Arnemann, Fritz Sommer
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we describe simulation experiments with our two models, CRP-PIG and EB-CRP-PIG, and compare them with published approaches. The models are tested in environments defined in the literature and also in an unbounded world. First the agents were tested in a bounded maze environment taken from [12] (Figure 2). Figure 3 depicts the missing information (11) in the bounded maze for the various learning strategies over 3000 sampling steps averaged over 200 runs. |
| Researcher Affiliation | Academia | Shariq A. Mobin, James A. Arnemann, Friedrich T. Sommer Redwood Center for Theoretical Neuroscience University of California, Berkeley Berkeley, CA 94720 shariqmobin@berkeley.edu, arnemann@berkeley.edu, fsommer@berkeley.edu |
| Pseudocode | No | The paper contains mathematical formulations and descriptions of the proposed models and processes, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code, nor does it include links to a code repository for the methodology described. |
| Open Datasets | Yes | First the agents were tested in a bounded maze environment taken from [12] (Figure 2). The state space in the maze consists of the |S | = 36 rooms. To directly assess how efficient learning translates to the ability to harvest reward, we consider the 5state Chain problem [19], shown in Figure 4, a popular benchmark problem. |
| Dataset Splits | No | The paper mentions running simulations over a certain number of steps and averaging over runs (e.g., "3000 sampling steps averaged over 200 runs", "1000 steps, averaged over 500 runs"), but it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for the environments used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing instance types. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific solvers/libraries). |
| Experiment Setup | Yes | With the discount factor, λ, set to 0.95, one can define how actions are chosen by all our PIG agents. We found S=120 to be roughly optimal for our agent and display the results of the experiment in Table 1, taking the results of the competitor algorithms directly from the corresponding papers. Specifically, we assign a reward to each state equal to the Euclidian distances from the starting state. Like for the Chain problem before, we create two agents EB-CRP-PIG-R and LTA-R which each run for 1000 total steps, exploring for S=750 steps (defined previously) and then calculating their best reward policy and executing it for the remaining 250 steps. |