Understanding Pictograph with Facial Features: End-to-End Sentence-Level Lip Reading of Chinese

Authors: Xiaobing Zhang, Haigang Gong, Xili Dai, Fan Yang, Nianbo Liu, Ming Liu9211-9218

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we implement visual-only Chinese lip reading of unconstrained sentences in a two-step end-to-end architecture (Lip CH-Net)... We collect 6-month daily news broadcasts from China Central Television (CCTV) website, and semi-automatically label them into a 20.95 GB dataset with 20,495 natural Chinese sentences. When trained on the CCTV dataset, the Lip CH-Net model outperforms the performance of all stateof-the-art lip reading frameworks.
Researcher Affiliation Collaboration Xiaobing Zhang,1,2 Haigang Gong,1 Xili Dai,1 Fan Yang,1 Nianbo Liu,1 Ming Liu1 1University of Electronic Science and Technology of China, UESTC 2CETC Big Data Research Institute Co.,Ltd, Guizhou, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper states, 'The data will be released as a resource for evaluation,' referring to the dataset, but does not provide concrete access to the source code for the methodology described in the paper.
Open Datasets No The paper states, 'The data will be released as a resource for evaluation,' implying future availability, but does not provide concrete access information (link, DOI, specific repository, or citation to an already public dataset) for its collected dataset within the paper.
Dataset Splits Yes The training, validation, and test data are divided according to the proportion of 7:1:2.
Hardware Specification Yes The network is trained using stochastic gradient descent on 4 Nvidia GTX 1080 GPU with 8GB memory and Intel Xeon processor E5-2620 with 32GB memory.
Software Dependencies No The paper mentions, 'Our implementation is based on the Tensorflow library,' but does not provide specific version numbers for TensorFlow or any other key software components.
Experiment Setup Yes Experimental results indicate the initial learning rate of 0.1 in Conv Net and 0.001 in LSTMs can make the Picture-to-Pinyin model fully converge. Parameter initialization range is selected from -0.02 to +0.02 and the initial learning rate is 0.001. The initial learning rate of 0.001 was applied and decreased by shrinking three times, if the training error did not increase for 10,000 iterations.