reproducibilityindex.ai

V-MIN: Efficient Reinforcement Learning through Demonstrations and Relaxed Reward Demands

Authors: David Martínez, Guillem Alenyà, Carme Torras

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performance of V-MIN has been validated through experimentation, including domains from the international planning competition.5 Experimental Results This section analyzes V-MIN performance: 1. Comparing V-MIN with REX-D (Mart ınez et al. 2014) and REX (Lang, Toussaint, and Kersting 2012). 2. Using different values of Vmin and the increasing Vmin. 3. Adaptation to changes in the tasks.
Researcher Affiliation	Academia	David Mart ınez and Guillem Aleny a and Carme Torras Institut de Rob otica i Inform atica Industrial (CSIC-UPC) C/ Llorens i Artigas 4-6. 08028 Barcelona, Spain {dmartinez,galenya,torras}@iri.upc.edu
Pseudocode	Yes	Algorithm 1 V-MIN Input: Reward function R, conﬁdence threshold ζ Updates: Set of experiences E 1: Observe state s0 2: loop 3: Update transition model T according to E 4: Create Tvmin(T, R, ζ) 5: Plan an action at using Tvmin 6: if at == Teacher Request() then 7: at = Request demonstration 8: else 9: Execute at 10: end if 11: Observe new state st+1 12: Add {(st, at, st+1)} to E 13: end loop
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	In this experiment we compared V-MIN with REX-D (Mart ınez et al. 2014) and the R-MAX variant of REX in two problems of the International Probabilistic Planning Competition (2008).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions "Means over 250 runs" or "Means over 100 runs" but no explicit splits.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. It mentions using 'Pasula, Zettlemoyer, and Kaelbling (2007) s learner and the Gourmand planner (Kolobov, Mausam, and Weld 2012)' but no version numbers for their implementations.
Experiment Setup	Yes	Episodes were limited to 100 actions before considering them as a failure. The exploration threshold was set to ζ = 3, which yielded good results. Based on the performance analysis in Sec. 4, using a discount factor γ = 0.9, an accuracy parameter ϵ = 0.1 and an exploration threshold ζ = \|S\|/(ϵ2(1 γ)4) = 3, the upper bound for the R-MAX sample complexity is proportional to \|S\|\|A\|ζ/(ϵ(1 γ)2) = 50 43 3/(0.1(1 0.9)2) = 2.2 106.