Hypermodels for Exploration
Authors: Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the use of hypermodels to represent epistemic uncertainty and guide exploration. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. Our simulation results show that a diagonal linear hypermodel requires about 50 to 100 times less computation than an ensemble hypermodel to achieve our target level of performance. In our simulations, we found that training without data perturbation gives lower regret for both agents. Figure 4 plots regret realized by TS and variance-IDS using the aforementioned hypermodel, trained with perturbed SGD. |
| Researcher Affiliation | Industry | Deep Mind |
| Pseudocode | No | The paper describes algorithms mathematically and in text but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating its own data for bandit problems (e.g., 'We generate data using a neural network...'). It does not provide access information (link, citation, or repository) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not explicitly provide details about train/validation/test splits, percentages, or sample counts needed to reproduce the experiments. It mentions a 'time horizon to 10,000 periods' for bandits but not formal dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or libraries used in the experiments. |
| Experiment Setup | Yes | Hypermodel parameters are updated according to ν ν α ν L(ν, D, Z)/|D| where α, σ2 w, and σ2 p are algorithm hyperparameters. In our experiments, we will take the step size α to be constant over iterations. We fix the data batch size to 1024 for both agents. |