CASR: Generating Complex Sequences with Autoregressive Self-Boost Refinement
Authors: Hongwei Han, Mengyu Zhou, Shi Han, Xiu Li, Dongmei Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating CASR on Sudoku, Web QSP, MTOP and KVRET through controlled experiments and empirical studies, we find that CASR produces high-quality outputs. CASR also improves Accuracy on Sudoku (70.93% 97.28%) and achieves state-of-the-art performance on KVRET with Micro F1 score (67.88% 70.00%). |
| Researcher Affiliation | Collaboration | Hongwei Han1 Mengyu Zhou2 Shi Han2 Xiu Li1 Dongmei Zhang2 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Microsoft Research |
| Pseudocode | Yes | Algorithm 1 CASR Inference Process. Algorithm 2 CASR Training Process ( 3.1). |
| Open Source Code | Yes | The code of CASR framework is open sourced in the repository at https: //github.com/Ralph Han/CASR. |
| Open Datasets | Yes | Web QSP (Yih et al., 2016) is a classic dataset for KBQA(Knowledge Base Question Answering). MTOP (Li et al., 2021) is a benchmark for comprehensive multilingual task-oriented semantic parsing. KVRET (Eric et al., 2017) is a benchmark for table conversation. Sudoku (PARK) is an open dataset on Kaggle. |
| Dataset Splits | Yes | Table 2: The Number of Samples in Train, Dev, and Test Splits of Web QSP, MTOP, KVRET, and Sudoku. Task Train Dev Test Web QSP 2673 309 1639 MTOP 15667 2235 4386 KVRET 6291 777 808 Sudoku 800K 100K 100K |
| Hardware Specification | Yes | We train on 4 Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using T5-base and T5-large backbones, and deepspeed, but does not provide specific version numbers for these or other software libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For the three tasks, we set the batch-size to 128, learning-rate to 2e-5, max-input-length to 1024, max-generation-length to 128, beam-size to 4, and evaluate every 2K steps for checkpoint selection. For Sudoku, we train a 12-layer encoder-decoder transformer from scratch, with d-model=512, ffndim=2048, num-heads=8. We set max castep T = 5 and max epoch E = 10K steps. We set the batch-size to 1024, learning-rate to 2e-5, beam-size to 2, and evaluate every 2K steps for checkpoint selection. |