reproducibilityindex.ai

Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

Authors: Tengfei Ma, Tetsuya Nasukawa

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results using real world data demonstrate the utility and efﬁcacy of the proposed models. ... In Section 5 we describe our experiments.
Researcher Affiliation	Industry	Tengfei Ma IBM T. J. Watson Research Center Tengfei.Ma1@ibm.com Tetsuya Nasukawa IBM Research-Tokyo NASUKAWA@jp.ibm.com
Pseudocode	No	The paper describes generative processes for its models but does not present them in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets	Yes	The ﬁrst corpus comes from a bilingual law dataset 1. We selected it for our experiments because it has an associated law dictionary that can be directly used as test data. ... 1http://www.phontron.com/jaen-law/index-ja.html The other dataset is a collection of car complaints from MLIT2 and NHTSA3. ... 2http://www.mlit.go.jp/jidosha/carinf/rcl/defects.html 3http://www-odi.nhtsa.dot.gov/downloads/index.cfm
Dataset Splits	No	The paper mentions "test data" and "burn-in steps" for inference but does not explicitly provide details on train/validation/test dataset splits, percentages, or specific counts.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The Japanese texts are processed by our own NLP tool to obtain the segmentation and the English texts are tokenized and lemmatized by NLTK4. (No specific version numbers for NLTK or other key software components are provided.)
Experiment Setup	Yes	For all the models, we set the hyperparameters as follows: α = αφ = 0.5, β = 0.01, and the topic number is 50. We run 1500 iterations for inference while the ﬁrst 1000 iterations are discarded as burn-in steps.