The new Language Identification API: lang-4.0
We are happy to announce that we are rolling out a new Language Identification API. This new API is called lang-4.0 and will be published as a private beta for a short time. After that, it will be publicly available for all MeaningCloud users.
Language Identification (LID) is a key task in Natural Language Processing. It is commonly used in pre-classification or document selection. Traditional LID methods like Markov chains offer good results for long texts. However, the precision for short texts tends to be worse.
We have observed that the ratio of relatively short texts analyzed in MeaningCloud shows a growing trend. Because of this we decided to improve our n-gram based lang-2.0.
The new lang-4.0 API is based in a deep neural network capable of detecting more than 180 different languages. It offers a high precision for both long and short texts without sacrificing performance.