See my publications on Google Scholar, Semantic Scholar and ACL Anthology.
2021
-
CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing
Jayanthi, Sai Muralidhar,
Nerella, Kavya,
Chandu, Khyathi Raghavi,
and Black, Alan W
In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Jun
2021
The NLP community has witnessed steep progress in a variety of tasks across the realms of monolingual and multilingual language processing recently. These successes, in conjunction with the proliferating mixed language interactions on social media, have boosted interest in modeling code-mixed texts. In this work, we present CodemixedNLP, an open-source code with the goals of bringing together the advances in code-mixed NLP and opening it up to a wider machine learning community. The code consists of tools to develop and benchmark versatile model architectures that are tailored for mixed texts, methods to expand training sets, techniques to quantify mixing styles, and fine-tuned state-of-the-art models for 7 tasks in Hinglish. We believe this work has the potential to foster a distributed yet collaborative and sustainable ecosystem in an otherwise dispersed space of code-mixing research. The toolkit is designed to be simple, easily extensible, and resourceful to both researchers as well as practitioners. Demo: \textlesshttp://k-ikkees.pc.cs.cmu.edu:5000\textgreater and Library: \textlesshttps://github.com/murali1996/CodemixedNLP\textgreater
-
A Study of Morphological Robustness of Neural Machine Translation
Jayanthi, Sai Muralidhar,
and Pratapa, Adithya
In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Aug
2021
In this work, we analyze the robustness of neural machine translation systems towards grammatical perturbations in the source. In particular, we focus on morphological inflection related perturbations. While this has been recently studied for English→French (MORPHEUS) (Tan et al., 2020), it is unclear how this extends to Any→English translation systems. We propose MORPHEUS-MULTILINGUAL that utilizes UniMorph dictionaries to identify morphological perturbations to source that adversely affect the translation models. Along with an analysis of state-of-the-art pretrained MT systems, we train and analyze systems for 11 language pairs using the multilingual TED corpus (Qi et al., 2018). We also compare this to actual errors of non-native speakers using Grammatical Error Correction datasets. Finally, we present a qualitative and quantitative analysis of the robustness of Any→English translation systems.
-
SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification
Jayanthi, Sai Muralidhar,
and Gupta, Akshat
In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Apr
2021
In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of multilingual BERT models with a masked language modeling objective. Our system was ranked 1st for Kannada, 2nd for Malayalam and 3rd for Tamil.
2020
-
NeuSpell: A Neural Spelling Correction Toolkit
Jayanthi, Sai Muralidhar,
Pruthi, Danish,
and Neubig, Graham
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Oct
2020
We introduce NeuSpell, an open-source toolkit for spelling correction in English. Our toolkit comprises ten different models, and benchmarks them on naturally occurring misspellings from multiple sources. We find that many systems do not adequately leverage the context around the misspelt token. To remedy this, (i) we train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings; and (ii) use richer representations of the context. By training on our synthetic examples, correction rates improve by 9% (absolute) compared to the case when models are trained on randomly sampled character perturbations. Using richer contextual representations boosts the correction rate by another 3%. Our toolkit enables practitioners to use our proposed and existing spelling correction systems, both via a simple unified command line, as well as a web interface. Among many potential applications, we demonstrate the utility of our spell-checkers in combating adversarial misspellings. The toolkit can be accessed at neuspell.github.io.
-
Constrained Fact Verification for FEVER
Pratapa, Adithya,
Jayanthi, Sai Muralidhar,
and Nerella, Kavya
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Nov
2020
Fact-verification systems are well explored in the NLP literature with growing attention owing to shared tasks like FEVER. Though the task requires reasoning on extracted evidence to verify a claim’s factuality, there is little work on understanding the reasoning process. In this work, we propose a new methodology for fact-verification, specifically FEVER, that enforces a closed-world reliance on extracted evidence. We present an extensive evaluation of state-of-the-art verification models under these constraints.
2017
-
Divide-and-warp temporal alignment of speech signals between speakers: Validation using articulatory data
Jayanthi, Sai Muralidhar,
MĂ©nard, Lucie,
and Laporte, Catherine
In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2017