Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data
Authors: Tengfei Ma, Tetsuya Nasukawa
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results using real world data demonstrate the utility and efficacy of the proposed models. ... In Section 5 we describe our experiments. |
| Researcher Affiliation | Industry | Tengfei Ma IBM T. J. Watson Research Center Tengfei.Ma1@ibm.com Tetsuya Nasukawa IBM Research-Tokyo NASUKAWA@jp.ibm.com |
| Pseudocode | No | The paper describes generative processes for its models but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | The first corpus comes from a bilingual law dataset 1. We selected it for our experiments because it has an associated law dictionary that can be directly used as test data. ... 1http://www.phontron.com/jaen-law/index-ja.html The other dataset is a collection of car complaints from MLIT2 and NHTSA3. ... 2http://www.mlit.go.jp/jidosha/carinf/rcl/defects.html 3http://www-odi.nhtsa.dot.gov/downloads/index.cfm |
| Dataset Splits | No | The paper mentions "test data" and "burn-in steps" for inference but does not explicitly provide details on train/validation/test dataset splits, percentages, or specific counts. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The Japanese texts are processed by our own NLP tool to obtain the segmentation and the English texts are tokenized and lemmatized by NLTK4. (No specific version numbers for NLTK or other key software components are provided.) |
| Experiment Setup | Yes | For all the models, we set the hyperparameters as follows: α = αφ = 0.5, β = 0.01, and the topic number is 50. We run 1500 iterations for inference while the first 1000 iterations are discarded as burn-in steps. |