A Deep Generative Model for Code Switched Text

Authors: Bidisha Samanta, Sharmila Reddy, Hussain Jagirdar, Niloy Ganguly, Soumen Chakrabarti

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity. ... 4 Experimental Setup ... 5 Results and Analysis
Researcher Affiliation Academia 1Indian Institute of Technology, Kharagpur 2Indian Institute of Technology, Bombay {bidisha, sharmilanangi, hussainjagirdar.hj}@iitkgp.ac.in, niloy@cse.iitkgp.ac.in, soumen@cse.iitb.ac.in
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes 1https://github.com/bidishasamantakgp/VACS
Open Datasets Yes To train the generative models, we use a subset of the (real) Hindi-English tweets collected by [Patro et al., 2017] and automatically language-tagged by [Rijhwani et al., 2017] with reasonable accuracy.
Dataset Splits Yes From this set we sample 6K tweets where code-switching is present, which we collect into folds r CS-train, r CS-valid. ... We sample 7K instances from the original real code-switched pool for validation and 7K for testing.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions using 'Adam optimiser' but does not specify version numbers for any software libraries, programming languages, or other dependencies.
Experiment Setup No The paper mentions the use of 'Adam optimiser and KL cost annealing technique [Bowman et al., 2015b]' but does not provide specific hyperparameter values such as learning rates, batch sizes, or number of epochs for the experimental setup.