Learning Structured Decision Problems with Unawareness
Authors: Craig Innes, Alex Lascarides
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our agent learns optimal behaviour on small and large decision problems, and that allowing an agent to conserve information upon discovering new possibilities results in faster convergence. This paper makes three contributions: ... Third, experiments on decision tasks of varying sizes showing our agent successfully learns optimal behaviour in practice (Section 4). |
| Researcher Affiliation | Academia | 1University of Edinburgh, UK. Correspondence to: Craig Innes <craig.innes@ed.ac.uk>, Alex Lascarides <alex@inf.ed.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 outlines the entire learning process. |
| Open Source Code | No | The paper does not provide concrete access to source code, nor does it explicitly state that the code will be made available. |
| Open Datasets | No | We tested agents on three randomly generated IDs of increasing size: 12, 24, and 36 variables. Our results were similar across all sizes, but the differences between agents were most pronounced on the largest case, so we present those here (Full ID specifications and results for the small and medium cases are included in the technical supplement). The paper mentions randomly generated IDs but does not provide any access information (link, DOI, citation for data, etc.) for these datasets. |
| Dataset Splits | No | The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The paper describes the number of trials and repetitions for the experiments but does not specify distinct training, validation, or test dataset splits in the traditional sense, as it's a reinforcement learning setup where data is gathered incrementally. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | In each, our agent begins with minimal awareness of the true ID (X 0 = {O1}, A0 = {A1}, scope0(R) = {O1}). The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The default agent follows algorithm 1 as is, with parameters κ = 0.001, τ = 100, ρ = 0.1, γ = 0.99, K = 5.0, µ = 10, β = 0.01 in equations (4), (8), (9), (27), and (29). |