Game Solving with Online Fine-Tuning
Authors: Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of computation time compared to the baseline without online fine-tuning. |
| Researcher Affiliation | Academia | 1Institute of Information Science, Academia Sinica, Taiwan 2Department of Computing Science, University of Alberta, Canada 3Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan 4Research Center for Information Technology Innovation, Academia Sinica, Taiwan |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our code is available at https://rlg.iis.sinica.edu.tw/papers/neurips2023-online-fine-tuning-solver. |
| Open Datasets | No | The paper describes using '16 challenging 7x7 Killall-Go three-move openings' selected based on expert recommendations and PCN policy head output, but does not provide a link, DOI, or a formal citation for public access to this specific set of openings as a dataset. |
| Dataset Splits | No | The paper does not provide specific training, validation, and test dataset splits (e.g., percentages, sample counts, or references to predefined splits) to reproduce data partitioning. |
| Hardware Specification | Yes | All experiments are conducted in three machines, each equipped with two Intel Xeon E5-2678 v3 CPUs, 192G RAM, and four GTX 1080Ti GPUs. |
| Software Dependencies | No | The paper mentions using Alpha Zero training framework, Gumbel Alpha Zero algorithm, and PCN, but does not provide specific version numbers for these or other ancillary software components like programming languages or libraries. |
| Experiment Setup | Yes | During optimization, the learning rate is fixed at 0.02, and the batch size is set to 1,024. The PCN is optimized for 500 steps for every 2,000 self-play games. |