A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification
Authors: Xiang Hu, XinYu KONG, Kewei Tu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our approach could achieve good prediction accuracy in downstream tasks. Meanwhile, the predicted span labels are consistent with human rationales to a certain degree. 4 EXPERIMENTS In this section, we compare our interpretable symbolic-Neural model with models based on dense sentence representation to verify our model works as well as conventional models. All systems are trained on raw texts and sentence-level labels only. Data set. We report the results on the development set of the following datasets: SST-2, Co LA (Wang et al., 2019), ATIS (Hakkani-Tur et al., 2016), SNIPS (Coucke et al., 2018), Stanford LU (Eric et al., 2017). Please note that SST-2, Co LA, and SNIPS are single-label tasks and ATIS, Stanford LU are multi-label tasks. |
| Researcher Affiliation | Collaboration | Xiang Hu 1, Xinyu Kong1, Kewei Tu 2 1Ant Group 2Shanghai Tech University |
| Pseudocode | Yes | Algorithm 1 Definition of Yield function |
| Open Source Code | Yes | Codes available at https://github.com/ant-research/Structured LM_RTDT |
| Open Datasets | Yes | Data set. We report the results on the development set of the following datasets: SST-2, Co LA (Wang et al., 2019), ATIS (Hakkani-Tur et al., 2016), SNIPS (Coucke et al., 2018), Stanford LU (Eric et al., 2017). |
| Dataset Splits | No | The paper states 'We report the results on the development set of the following datasets' but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or exact counts) for reproduction. It does not mention a predefined standard split with a citation for the datasets used in evaluation, other than referring to 'development set'. |
| Hardware Specification | Yes | We train all the systems across the seven datasets for 20 epochs with a learning rate of 5 10 5 for the encoder, 1 10 2 for the unsupervised parser, and batch size 64 on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions using BERT, Fast-R2D2, and refers to a Huggingface tutorial, but it does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or CUDA libraries. |
| Experiment Setup | Yes | Hyperparameters. Our BERT follows the setting in Devlin et al. (2019), using 12-layer Transformers with 768-dimensional embeddings, 3,072-dimensional hidden layer representations, and 12 attention heads. The setting of Fast-R2D2 follows Hu et al. (2022). Specifically, the tree encoder uses 4-layer Transformers with other hyper-parameters same as BERT and the top-down encoder uses 2-layer ones. The top-down parser uses a 4-layer bidirectional LSTM with 128-dimensional embeddings and 256-dimensional hidden layers. We train all the systems across the seven datasets for 20 epochs with a learning rate of 5 10 5 for the encoder, 1 10 2 for the unsupervised parser, and batch size 64 on 8 A100 GPUs. |