reproducibilityindex.ai

Branching Reinforcement Learning

Authors: Yihan Du, Wei Chen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments for branching RL. We set K = 5000, = 0.005, H = 6, m = 2, N {10, 15}, S = {s , s1, . . . , s5}. A is the collection of all m-cardinality subsets of Auniv = {a1, . . . , a N}, and thus \|A\| = N m {45, 105}. The reward function r(s, a) = 1 for any (s, a) S A. The trigger probability q(s, a) = 1 m for any (s, a) S {a N 1, a N}, and q(s, a) = 1 2m for any (s, a) S Auniv \ {a N 1, a N}. We set s1 as the initial state for each episode. Under all actions a Auniv, the transition probability q(s \|s1, a) = 0.5 for any s {s2, s3}, and q(s \|s, a) = 0.5 for any (s, s ) {s2, s3} {s4, s5} or (s, s ) {s4, s5} {s2, s3}. We perform 50 independent runs, and report the average regrets and running times (in legends) across runs.
Researcher Affiliation	Collaboration	1IIIS, Tsinghua University, Beijing, China 2Microsoft Research. Correspondence to: Yihan Du <duyh18@mails.tsinghua.edu.cn>, Wei Chen <weic@microsoft.com>.
Pseudocode	Yes	Algorithm 1 Branch VI Algorithm 2 Branch RFE
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper describes a constructed problem instance with specific parameters (H, m, N, S, reward function, trigger probability, transition probability) for its experiments, but it does not use a publicly available or open dataset. No concrete access information for a dataset is provided.
Dataset Splits	No	The paper defines the parameters of its constructed problem instance for the experiments, but it does not specify training, test, or validation dataset splits. The problem is a simulation within a defined environment, not a split of an existing dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers for reproducing the experiments.
Experiment Setup	Yes	We set K = 5000, = 0.005, H = 6, m = 2, N {10, 15}, S = {s , s1, . . . , s5}. A is the collection of all m-cardinality subsets of Auniv = {a1, . . . , a N}, and thus \|A\| = N m {45, 105}. The reward function r(s, a) = 1 for any (s, a) S A. The trigger probability q(s, a) = 1 m for any (s, a) S {a N 1, a N}, and q(s, a) = 1 2m for any (s, a) S Auniv \ {a N 1, a N}. We set s1 as the initial state for each episode. Under all actions a Auniv, the transition probability q(s \|s1, a) = 0.5 for any s {s2, s3}, and q(s \|s, a) = 0.5 for any (s, s ) {s2, s3} {s4, s5} or (s, s ) {s4, s5} {s2, s3}. We perform 50 independent runs, and report the average regrets and running times (in legends) across runs.