Strategies for Language Identification in Code-Mixed Low Resource Languages

Published in 15th International Conference on Natural Language Processing, Student Paper Competition (accepted), 2018

Recommended citation: S. Mandal and S. Sanand. Strategies for Language Identification in Code-Mixed Low Resource Languages . 15th International Conference on Natural Language Processing, Student Paper Competition (accepted) https://arxiv.org/pdf/1810.07156.pdf

abstact
In the recent years, substantial work has been done on language tagging of codemixed data, but most of them use large amounts of data to build their models. In this article, we present three strategies for building a word level language tagger for code-mixed data using very low resources. Each of them secured an accuracy higher than our baseline model, and the best performing system got an accuracy around 91%. Combining all, the ensemble system achieved an accuracy around 92.6%.

download paper here

@article{mandal2018strategies,
title={Strategies for Language Identification in Code-Mixed Low Resource Languages},
author={Mandal, Soumil and Sanand, Sankalp},
journal={arXiv preprint arXiv:1810.07156},
year={2018}
}