Multi-task Representation Learning for Pure Exploration in Linear Bandits
Authors: Yihan Du, Longbo Huang, Wen Sun
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments In this section, we present experiments to evaluate the empirical performance of our algorithms. In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. In both problems, B = [Ik; 0], where Ik denotes the k × k identity matrix. w1, . . . , wM are divided into k groups, with M/k same members in each group. The members in the i-th group (i ∈ [k]), i.e., w(M/k)(i−1)+1, . . . , w(M/k)i, have 1 in the i-th coordinate and 0 in all other coordinates. For any m ∈ [M], θm = Bwm. We vary M and perform 50 independent runs to report the average sample complexity across runs. |
| Researcher Affiliation | Academia | Yihan Du 1 Longbo Huang 1 Wen Sun 2 1IIIS, Tsinghua University 2Cornell University. |
| Pseudocode | Yes | Algorithm 1 Dou Exp Des (Double Experimental Design) Algorithm 2 Feat Recover(T, { xi}i [p]) Algorithm 3 Eli Low Rep(t, X,{ ˆ Xm}m [M], δ , ROUND,ζ, ˆ B) Algorithm 4 C-Dou Exp Des (Contextual Double Experimental Design) Algorithm 5 C-Feat Recover(T, { ai}i [p]) Algorithm 6 Est Low Rep(N, γ, ˆ B) |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code. |
| Open Datasets | No | In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. The data used for experiments is synthetically generated according to these specifications and not a publicly available dataset with a link or citation. |
| Dataset Splits | No | The paper describes the synthetic generation of data and parameters for experiments, but it does not specify explicit training, validation, and test dataset splits in the traditional sense, as it focuses on sample complexity in bandit settings. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions, or specific library versions). |
| Experiment Setup | Yes | In our experiments, we set δ = 0.005, d = 5, k = 2 and M ∈ [50, 230], where k divides M. In Rep BAI-LB, X is the canonical basis of Rd. In Rep BPI-CLB, we set ε = 0.1, |S| = 5 and |A| = 5. D is the uniform distribution on S. For any s ∈ S, {ϕ(s, a)}a∈A is the canonical basis of Rd. In both problems, B = [Ik; 0], where Ik denotes the k × k identity matrix. w1, . . . , wM are divided into k groups, with M/k same members in each group. The members in the i-th group (i ∈ [k]), i.e., w(M/k)(i−1)+1, . . . , w(M/k)i, have 1 in the i-th coordinate and 0 in all other coordinates. For any m ∈ [M], θm = Bwm. We vary M and perform 50 independent runs to report the average sample complexity across runs. |