Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Integrated Common Sense Learning and Planning in POMDPs

Authors: Brendan Juba

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We formulate a new variant of the problem of planning in an unknown environment, for which we can provide algorithms with reasonable theoretical guarantees in spite of large state spaces and time horizons, partial observability, and complex dynamics. In this variant, an agent is given a collection of example traces produced by a reference policy, which may, for example, capture the agent s past behavior. We describe an efficient algorithm that uses such common sense knowledge reflected in the example traces to construct decision tree policies for goal-oriented factored POMDPs. More precisely, our algorithm (provably) succeeds at finding a policy for a given input goal when (1) there is a CNF that is almost always observed satisfied on the traces of the POMDP, capturing a sufficient approximation of its dynamics and (2) for a decision tree policy of bounded complexity, there exist small-space resolution proofs that the goal is achieved on each branch using the aforementioned CNF capturing the common sense rules.
Researcher Affiliation Academia Brendan Juba EMAIL Washington University 1 Brookings Dr. St. Louis, MO 63130 USA
Pseudocode Yes Algorithm 1 Find-DT() PAC-refute(ϕ, R) decides if ϕ ψ is at most ϵ-valid given some testable ψ over weighted partial examples in R (taken out of m) (cf. Theorem 8). Input: DNF goal G, CNF KB Φ, Strahler number s, policy branch Π taking t 1 actions, horizon bound T, sample of traces with stepwise weights R Output: A Strahler-s decision tree or if Π takes more than T actions then return end if Rt {( t 1 i=1(o(i), a(i)) o(t), Πt 1 i=1wi) : ((o(i), a(i), wi)T i=1, o(T+1)) R, Πt 1 i=1wi 4 µ} if PAC-Refute( G(t) Φ Π taken, Rt) then return Π. end if if s > 1 then for all ℓs.t. ℓ(t) and ℓ(t) not in Π do Πℓ Π with a final test for ℓ(t)
Open Source Code No The paper does not provide concrete access to source code. It discusses algorithms and theoretical findings but does not mention any repository links, explicit code release statements, or code in supplementary materials for its methodology.
Open Datasets No The paper uses an example based on the 'Gripper problem from the first International Planning Competition (IPC 1998)' to illustrate concepts, but does not use it as an experimental dataset with access information. It does not provide concrete access information (link, DOI, repository, or citation) for any other publicly available or open datasets for empirical evaluation.
Dataset Splits No The paper does not describe empirical experiments with datasets; therefore, it does not provide any information regarding training/test/validation dataset splits.
Hardware Specification No The paper focuses on theoretical algorithms and their complexity analysis. It does not describe any specific hardware (like GPU models, CPU types, or cloud resources) used for running experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and proofs, not on implementation details. It does not mention any specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) used for experimental replication.
Experiment Setup No The paper is theoretical and describes algorithms and proofs. It does not include details about an experimental setup, such as hyperparameters, training configurations, or system-level settings, as no empirical experiments are presented.