Learning to Crawl
Authors: Utkarsh Upadhyay, Robert Busa-Fekete, Wojciech Kotlowski, David Pal, Balazs Szorenyi6046-6053
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of parameters. ... In Section 6, we test our algorithm using real data to justify our theoretical findings and we conclude with future research directions in Section 7. ... For the experimental setup, we make user of the MSMACRO dataset (Kolobov et al. 2019b). |
| Researcher Affiliation | Collaboration | Utkarsh Upadhyay Resonal, Berlin, Germany utkarsh@reason.al Robert Busa-Fekete Google Research, NY, USA busarobi@google.com Wojciech Kotłowski Poznan University of Technology, Poland wkotlowski@cs.put.poznan.pl David P al Yahoo! Research, NY, USA davidko.pal@gmail.com Bal azs Sz or enyi Yahoo! Research, NY, USA szorenyi.balazs@gmail.com |
| Pseudocode | No | The paper refers to algorithms from other works (e.g., "Algorithm 2 in (Azar et al. 2018)" and "Algorithm 3"), but it does not provide its own pseudocode or algorithm blocks for the explore-and-commit algorithm or any other proposed method within the paper. |
| Open Source Code | No | The paper does not provide any statement about making its source code available, nor does it include a link to a code repository for the methodology described. |
| Open Datasets | Yes | For the experimental setup, we make user of the MSMACRO dataset (Kolobov et al. 2019b). The dataset was collected over a period of 14 weeks by the production crawler for Bing. |
| Dataset Splits | No | The paper describes using the MSMACRO dataset and simulating change times but does not specify training, validation, or test splits, nor does it mention cross-validation or predefined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) used in the experiments. |
| Experiment Setup | Yes | We set ξmin = 10^9 and ξmax = 25. The experiments simulate the change times ((xi,n) n=1)i [m] for webpages 50 times with different random seeds... We run a grid search for different values of τ (starting from the minimum time required to sample each webpage at least once)... determine the parameters ξ and the regret suffered during the exploration phase. We calculate ρ using Algorithm 2 of Azar et al. (2018)... from τ till time horizon T = 10^4. |