Learning with Abandonment
Authors: Sven Schmit, Ramesh Johari
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 3.3, titled Simulations, the paper presents 'Cumulative regret plots' and states 'We observe that KL-UCB indeed performs better than the standard UCB algorithm.' It also states 'Code to replicate the simulations is available at https://github.com/schmit/learning-abandonment.' |
| Researcher Affiliation | Academia | Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA 2Management Science & Engineering, Stanford University, Stanford, CA, USA. |
| Pseudocode | No | The paper describes algorithms and strategies in text but does not include any formal pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | Yes | Code to replicate the simulations is available at https://github.com/schmit/learning-abandonment. |
| Open Datasets | No | The paper uses a simulated setting where 'the threshold distribution (unknown to the learning algorithm) is uniform on [0, 1]'. There is no traditional dataset provided with a URL, DOI, or repository, as the data is generated within the simulation based on this specified distribution. |
| Dataset Splits | No | The paper describes a simulation setup ('n = 2000 time steps', '50 repetitions') but does not specify traditional training, validation, or test dataset splits, as it operates within a sequential learning simulation environment rather than on a static dataset. |
| Hardware Specification | No | The paper does not specify any hardware used for running the simulations (e.g., CPU, GPU models, or cloud computing instances). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For the discretized policies, we set K ∼ 2(n/ log n)^1/4 = 10. The explore-exploit strategy first observes 20 + 2 √ n = 110 samples to estimate F, before committing to a fixed strategy. |