Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
Authors: Mozhi Zhang, Yoshinari Fujinuma, Jordan Boyd-Graber9547-9554
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments confirm that characterlevel knowledge transfer is more data-efficient than word-level transfer between related languages. |
| Researcher Affiliation | Academia | Mozhi Zhang CS and UMIACS University of Maryland College Park, MD, USA mozhi@cs.umd.edu Yoshinari Fujinuma Computer Science University of Colorado Boulder, CO, USA fujinumay@gmail.com Jordan Boyd-Graber CS, i School, LSC, and UMIACS University of Maryland College Park, MD, USA jbg@umiacs.umd.edu Now at Google Research Zurich |
| Pseudocode | No | The paper describes the model architecture and training process in text and with a diagram (Figure 1), but it does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements about making its source code available or links to a code repository. |
| Open Datasets | Yes | Our first dataset is Reuters multilingual corpus (RCV2), a collection of news stories labeled with four topics (Lewis et al. 2004)... We build a second CLDC dataset with famine-related documents sampled from Tigrinya (TI) and Amharic (AM) LORELEI language packs (Strassel and Tracey 2016). |
| Dataset Splits | No | For each language, we sample 1,500 training documents and 200 test documents with balanced labels. No explicit mention of a separate validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Adam (Kingma and Ba 2015) with default settings" as the optimizer, but does not specify versions for other software dependencies or libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We use three ReLU layers with 100 hidden units and 0.1 dropout for the CLWE-based DAN models and the DAN classifier of the CACO models. The BI-LSTM embedder uses ten dimensional character embeddings and forty hidden states with no dropout. The outputs of the embedder are forty dimensional word embeddings. We set λd to 1, λe to 0.001, and λp to 1 in the multi-task objective (Equation 11). ... All models are trained with Adam (Kingma and Ba 2015) with default settings. We run the optimizer for a hundred epochs with mini-batches of sixteen documents. |