A Deep Generative Model for Code Switched Text
Authors: Bidisha Samanta, Sharmila Reddy, Hussain Jagirdar, Niloy Ganguly, Soumen Chakrabarti
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity. ... 4 Experimental Setup ... 5 Results and Analysis |
| Researcher Affiliation | Academia | 1Indian Institute of Technology, Kharagpur 2Indian Institute of Technology, Bombay {bidisha, sharmilanangi, hussainjagirdar.hj}@iitkgp.ac.in, niloy@cse.iitkgp.ac.in, soumen@cse.iitb.ac.in |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | 1https://github.com/bidishasamantakgp/VACS |
| Open Datasets | Yes | To train the generative models, we use a subset of the (real) Hindi-English tweets collected by [Patro et al., 2017] and automatically language-tagged by [Rijhwani et al., 2017] with reasonable accuracy. |
| Dataset Splits | Yes | From this set we sample 6K tweets where code-switching is present, which we collect into folds r CS-train, r CS-valid. ... We sample 7K instances from the original real code-switched pool for validation and 7K for testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions using 'Adam optimiser' but does not specify version numbers for any software libraries, programming languages, or other dependencies. |
| Experiment Setup | No | The paper mentions the use of 'Adam optimiser and KL cost annealing technique [Bowman et al., 2015b]' but does not provide specific hyperparameter values such as learning rates, batch sizes, or number of epochs for the experimental setup. |