National Corpus of the Crimean Tatar Language

As per UNESCO’s classification, the Crimean Tatar language is critically endangered, with a high likelihood of extinction in the coming generation. Current research indicates that only 20–25% of Crimean Tatars residing in Ukraine are fluent in their native language. To alter this situation qualitatively, systemic solutions like the linguistic corpus are being introduced.

The corpus serves as an exhaustive research tool that views language as a database. It forms the foundation for integrating Crimean Tatar into operating systems, online translators, and spell-checking programs. It is a practical instrument for linguists, students, and developers who will be working on systems and projects utilizing the Crimean Tatar language.

The corpus enables users to analyze a vast amount of language materials with just a few clicks. For instance, the service can process a text array of 20,000 pages within seconds. It would take an individual 50 business days to merely read through this volume of material. The data analysis results from the Corpus will be significantly more accurate and representative than similar studies of book (library) collections.

A dedicated team of about 30 participants from various parts of Ukraine and the world collaborated on this project for a year. Over the year, more than 900 materials, including fiction and scientific literature, periodicals, etc., were analyzed. The search for the Corpus text base was challenged by the inaccessibility of Crimean library collections, the existence of 4 graphic systems, and a shortage of experts (as most Crimean Tatar philologists reside in the occupied territory).

The presentation of the Corpus of the Crimean Tatar language featured a panel discussion on current strategies for promoting the Crimean Tatar language in Ukraine.

“Switzerland, being a multilingual country, understands the importance of supporting diversity, indigenous peoples, and national minorities through projects like these. Language is more than a mere communication tool; it’s a unifying force that preserves national identity,” remarked Felix Baumann, Swiss Ambassador to Ukraine and Moldova.

“Digital transformation is relevant everywhere — from education to countering corruption, from public participation to receiving services. The exercise of indigenous peoples’ rights is no exception. Therefore, the EGAP Program of East Europe Foundation supported the technical solution, which is an innovation based on traditions. Prerequisite to promote the Crimean Tatar language on the Internet,” Viktor Liakh, the President of East Europe Foundation, said.

“The Crimean Tatar language should be presented on various digital platforms. This would enable both native speakers and language learners to translate any word using online translators and use the language on social networks and other applications. The Corpus creation is an important step to digitalize and promote the Crimean Tatar language,” Tamila Tasheva, Permanent Representative of the President of Ukraine in the Autonomous Republic of Crimea, noted.

The panel discussion also included contributions from Volodymyr Tarchynskiy, Director of the Department for Temporarily Occupied Territories and Information Sovereignty of the Ministry of Reintegration of the Temporarily Occupied Territories of Ukraine, Alim Aliyev, Deputy Director General of the Ukrainian Institute, Mustafa Ametov, head of the public association “Institute for the Development of the Crimean Tatar Language”, Suleiman Mamutov, a member of the UN Permanent Forum on Indigenous Issues, and Abibulla Seit-Celil, an assistant at the Turkology Department Taras Shevchenko National University of Kyiv. The discussion was moderated by Sevgil Musayeva, editor-in-chief of Ukrainska Pravda.

The event also served a charitable purpose – raising funds for the family of the fallen hero, Seyran Kadyrov.

The National Corpus of the Crimean Tatar Language project was implemented by the QIRI’M Young NGO supported by the EGAP Program, implemented by East Europe Foundation and funded by Switzerland, the Representative Office of the President of Ukraine in the Autonomous Republic of Crimea, the Ministry of Reintegration of the Temporarily Occupied Territories of Ukraine, and Taras Shevchenko National University of Kyiv.