New AI Model Translates 200 Languages, Making Technology Accessible to More People
Language is our lifeline to the world. But because high-quality translation tools don’t exist for hundreds of languages, billions of people today can’t access digital content or participate fully in conversations and communities online in their preferred or native languages. This is particularly an issue for hundreds of millions of people who speak the many languages of Africa and Asia.
To help people connect better today and be part of the metaverse of tomorrow, our AI researchers created No Language Left Behind (NLLB), an effort to develop high-quality machine translation capabilities for most of the world’s languages. Today, we’re announcing an important breakthrough in NLLB: We’ve built a single AI model called NLLB-200, which translates 200 different languages with results far more accurate than what previous technology could accomplish.
When comparing the quality of translations to previous AI research, NLLB-200 scored an average of 44% higher. For some African and Indian-based languages, NLLB-200’s translations were more than 70% more accurate.
To best evaluate and improve NLLB-200, we built FLORES-200, a dataset that enables researchers to assess this AI model’s performance in 40,000 different language directions. FLORES-200 allows us to measure NLLB-200’s performance in each language to confirm that the translations are high quality.
And to help other researchers improve their translation tools and build on our work, we’re opening NLLB-200 models and the FLORES-200 dataset to developers, in addition to our model training code and code for re-creating the training dataset.
We’re also awarding up to $200,000 of grants for impactful uses of NLLB-200 to researchers and nonprofit organizations with initiatives focused on sustainability, food security, gender-based violence, education or other areas in support of the UN Sustainable Development Goals. Nonprofits interested in using NLLB-200 to translate two or more African languages, as well as researchers working in linguistics, machine translation and language technology, are invited to apply.
These research advancements will support more than 25 billion translations served every day in Feed on Facebook, Instagram and our other technologies. You can explore a demo of NLLB-200 and take a deeper dive into how we developed this model.
Expanded Translation and Greater Inclusion
A handful of languages — including English, Mandarin, Spanish and Arabic — dominate the web. Native speakers of these very widely spoken languages may take for granted how meaningful it is to read something in your own mother tongue. NLLB will help more people read things in their preferred language, rather than always requiring an intermediary language that often gets the sentiment or content wrong.
This work can also help advance other technologies, like building assistants that work well in languages such as Javanese and Uzbek, or creating systems to take Bollywood movies and add accurate subtitles in Swahili or Oromo.
As the metaverse begins to take shape, the ability to build technologies that work well in a wider range of languages will help to democratize access to immersive experiences in virtual worlds.
Learn more about our work to build NLLB-200, which will help make the metaverse accessible to more people around the world.