Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

Authors: Tengfei Ma, Tetsuya Nasukawa

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results using real world data demonstrate the utility and efficacy of the proposed models. ... In Section 5 we describe our experiments.
Researcher Affiliation Industry Tengfei Ma IBM T. J. Watson Research Center Tengfei.Ma1@ibm.com Tetsuya Nasukawa IBM Research-Tokyo NASUKAWA@jp.ibm.com
Pseudocode No The paper describes generative processes for its models but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes The first corpus comes from a bilingual law dataset 1. We selected it for our experiments because it has an associated law dictionary that can be directly used as test data. ... 1http://www.phontron.com/jaen-law/index-ja.html The other dataset is a collection of car complaints from MLIT2 and NHTSA3. ... 2http://www.mlit.go.jp/jidosha/carinf/rcl/defects.html 3http://www-odi.nhtsa.dot.gov/downloads/index.cfm
Dataset Splits No The paper mentions "test data" and "burn-in steps" for inference but does not explicitly provide details on train/validation/test dataset splits, percentages, or specific counts.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The Japanese texts are processed by our own NLP tool to obtain the segmentation and the English texts are tokenized and lemmatized by NLTK4. (No specific version numbers for NLTK or other key software components are provided.)
Experiment Setup Yes For all the models, we set the hyperparameters as follows: α = αφ = 0.5, β = 0.01, and the topic number is 50. We run 1500 iterations for inference while the first 1000 iterations are discarded as burn-in steps.