To bridge the gap in digital representation among European languages, Tilde, a Latvian organization, is working to address the challenges posed by AI models predominantly trained on English and major languages while many European languages remain underserved in the digital domain. As language plays a significant role in cultural identity and communication, Tilde aims to create equitable language technologies that cater to the needs of smaller and regional languages. This effort is increasingly important as over 250 million Europeans speak these languages, yet face limitations in AI application due to language barriers.
The digital landscape has been evolving with models like ChatGPT and Claude focusing heavily on English and other major languages. However, limited attention has been given to national language models which are crucial for linguistic diversity and inclusivity. Tilde’s initiative is pivotal amidst this context, as it aims to foster digital equity for languages like Latvian and Basque with minimal digital representation. Together with European Union’s backing, this initiative stands as a critical step to address issues of data sovereignty and privacy concerns prevalent due to most AI models being hosted outside the EU.
How is Tilde Advancing Multilingual AI?
Tilde has initiated the development of TildeLM, an open-source large language model focused on Baltic and Eastern European languages. Employing a collaborative team involving researchers, linguists, and translators, Tilde hopes to fill the gap AI models have left for smaller languages. Previously, EU research centers and universities collaborated with Tilde’s dedicated team to enhance technology research within the Baltic region.
What are the Main Features of TildeLM?
TildeLM is structured to support languages and dialects traditionally underserved by large AI models. With over 30 billion parameters, it aims to integrate smaller languages into digital systems through tailored language services like corrections and translations. This project doesn’t aim to commercialize AI models but seeks to involve them in various digital and business domains as localization services, enhancing the sovereign control over language technologies within Europe.
TildeLM’s accessibility, including smaller, locally deployable models, encourages data protection compliance and reduces reliance on large cloud infrastructures, making it accessible to organizations with limited resources. By implementing compression techniques, the model is efficient to run on standard hardware, increasing its applicability while ensuring digital sovereignty. Tilde’s efforts not only aim to advance language technology but also empower organizations to build custom AI solutions that honor Europe’s diverse linguistic heritage.
With the recent selection in the EU-backed LARGE AI GRAND CHALLENGE, Tilde has gained substantial support and resources, including 2 million GPU hours on the EuroHPC LUMI supercomputer, which bolsters their project development. This marks an essential milestone in the project’s lifecycle, enhancing its capability to cater to wider linguistic needs across Europe.
“Our business is not to commercialize the model itself, but to integrate it into products and offer localization services. Some clients want models deployed within their own infrastructure — or at least within their national borders — because of data regulations,” said Toms Bergmanis.
This statement underscores the significance of localization and sovereignty in the domain of AI, promoting a tailored approach to language technology challenges.
TildeLM’s development approaches resonate with other European initiatives. SiloGen, as part of Silo AMD (NASDAQ:AMD), is forging a consortium to promote LLMs across all EU languages, reflecting an ongoing pan-European movement towards fostering linguistic diversity in digital platforms. Additionally, ETH Zurich’s slated 2025 release of a multilingual LLM trained on 15 trillion tokens further underlines Europe’s commitment to inclusive AI.
Such collaborations and developments underscore the regional acknowledgment of the necessity for a multifaceted approach towards AI model development. By fostering linguistic diversity, these initiatives contribute to eliminating barriers in technology accessibility across various languages, which otherwise remain overshadowed by larger language models.
Tilde’s commitment and progress highlight the importance of linguistic equity amid growing digital demands. As the European AI landscape evolves, models like TildeLM are crucial for ensuring that technology reflects and supports Europe’s cultural and linguistic diversity. Such initiatives encourage a more inclusive AI environment, bridging the gap between technology and language inclusion.