Learning Structured Decision Problems with Unawareness

Authors: Craig Innes, Alex Lascarides

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our agent learns optimal behaviour on small and large decision problems, and that allowing an agent to conserve information upon discovering new possibilities results in faster convergence. This paper makes three contributions: ... Third, experiments on decision tasks of varying sizes showing our agent successfully learns optimal behaviour in practice (Section 4).
Researcher Affiliation Academia 1University of Edinburgh, UK. Correspondence to: Craig Innes <craig.innes@ed.ac.uk>, Alex Lascarides <alex@inf.ed.ac.uk>.
Pseudocode Yes Algorithm 1 outlines the entire learning process.
Open Source Code No The paper does not provide concrete access to source code, nor does it explicitly state that the code will be made available.
Open Datasets No We tested agents on three randomly generated IDs of increasing size: 12, 24, and 36 variables. Our results were similar across all sizes, but the differences between agents were most pronounced on the largest case, so we present those here (Full ID specifications and results for the small and medium cases are included in the technical supplement). The paper mentions randomly generated IDs but does not provide any access information (link, DOI, citation for data, etc.) for these datasets.
Dataset Splits No The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The paper describes the number of trials and repetitions for the experiments but does not specify distinct training, validation, or test dataset splits in the traditional sense, as it's a reinforcement learning setup where data is gathered incrementally.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes In each, our agent begins with minimal awareness of the true ID (X 0 = {O1}, A0 = {A1}, scope0(R) = {O1}). The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The default agent follows algorithm 1 as is, with parameters κ = 0.001, τ = 100, ρ = 0.1, γ = 0.99, K = 5.0, µ = 10, β = 0.01 in equations (4), (8), (9), (27), and (29).