Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multi-task Representation Learning for Pure Exploration in Linear Bandits
Authors: Yihan Du, Longbo Huang, Wen Sun
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments In this section, we present experiments to evaluate the empirical performance of our algorithms. In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. In both problems, B = [Ik; 0], where Ik denotes the k × k identity matrix. w1, . . . , wM are divided into k groups, with M/k same members in each group. The members in the i-th group (i ∈ [k]), i.e., w(M/k)(i−1)+1, . . . , w(M/k)i, have 1 in the i-th coordinate and 0 in all other coordinates. For any m ∈ [M], θm = Bwm. We vary M and perform 50 independent runs to report the average sample complexity across runs. |
| Researcher Affiliation | Academia | Yihan Du 1 Longbo Huang 1 Wen Sun 2 1IIIS, Tsinghua University 2Cornell University. |
| Pseudocode | Yes | Algorithm 1 Dou Exp Des (Double Experimental Design) Algorithm 2 Feat Recover(T, { xi}i [p]) Algorithm 3 Eli Low Rep(t, X,{ ˆ Xm}m [M], δ , ROUND,ζ, ˆ B) Algorithm 4 C-Dou Exp Des (Contextual Double Experimental Design) Algorithm 5 C-Feat Recover(T, { ai}i [p]) Algorithm 6 Est Low Rep(N, γ, ˆ B) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code. |
| Open Datasets | No | In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. The data used for experiments is synthetically generated according to these specifications and not a publicly available dataset with a link or citation. |
| Dataset Splits | No | The paper describes the synthetic generation of data and parameters for experiments, but it does not specify explicit training, validation, and test dataset splits in the traditional sense, as it focuses on sample complexity in bandit settings. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions, or specific library versions). |
| Experiment Setup | Yes | In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. In both problems, B = [Ik; 0], where Ik denotes the k × k identity matrix. w1, . . . , wM are divided into k groups, with M/k same members in each group. The members in the i-th group (i ∈ [k]), i.e., w(M/k)(i−1)+1, . . . , w(M/k)i, have 1 in the i-th coordinate and 0 in all other coordinates. For any m ∈ [M], θm = Bwm. We vary M and perform 50 independent runs to report the average sample complexity across runs. |