reproducibilityindex.ai

Learning Structured Decision Problems with Unawareness

Authors: Craig Innes, Alex Lascarides

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our agent learns optimal behaviour on small and large decision problems, and that allowing an agent to conserve information upon discovering new possibilities results in faster convergence. This paper makes three contributions: ... Third, experiments on decision tasks of varying sizes showing our agent successfully learns optimal behaviour in practice (Section 4).
Researcher Affiliation	Academia	1University of Edinburgh, UK. Correspondence to: Craig Innes <craig.innes@ed.ac.uk>, Alex Lascarides <alex@inf.ed.ac.uk>.
Pseudocode	Yes	Algorithm 1 outlines the entire learning process.
Open Source Code	No	The paper does not provide concrete access to source code, nor does it explicitly state that the code will be made available.
Open Datasets	No	We tested agents on three randomly generated IDs of increasing size: 12, 24, and 36 variables. Our results were similar across all sizes, but the differences between agents were most pronounced on the largest case, so we present those here (Full ID speciﬁcations and results for the small and medium cases are included in the technical supplement). The paper mentions randomly generated IDs but does not provide any access information (link, DOI, citation for data, etc.) for these datasets.
Dataset Splits	No	The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The paper describes the number of trials and repetitions for the experiments but does not specify distinct training, validation, or test dataset splits in the traditional sense, as it's a reinforcement learning setup where data is gathered incrementally.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	In each, our agent begins with minimal awareness of the true ID (X 0 = {O1}, A0 = {A1}, scope0(R) = {O1}). The agent acts in 5000 trials, using an ϵ-greedy policy (ϵ = 0.1). We repeat experiments 100 times and average the results. The default agent follows algorithm 1 as is, with parameters κ = 0.001, τ = 100, ρ = 0.1, γ = 0.99, K = 5.0, µ = 10, β = 0.01 in equations (4), (8), (9), (27), and (29).