NLLB-200: Completely Open-Source

Rina7RS · Post by **Rina7RS** » Sat Feb 08, 2025 8:25 am

From the outset, NLLB’s different parts have been made open-source. Among the things Meta has made freely available aside from the MT model itself are improvements to its encoder LASER (Language-Agnostic Sentence Representation), the FLORES (Facebook Low-Resource) benchmark used for evaluating the quality of translations, and professionally translated datasets used in training the AI.

This is important as it means that complete access is open for research and development.

The FLORES-101 dataset, precursor of the current FLORES-200, was released open-source in June 2021 to create a benchmark for evaluating MT of low-resource languages. It has quickly been put to use since then, including during the 2021 Conference on Machine Translation. FLORES-200 improves upon it by extending its language coverage from a hundred languages to two hundred, and will continue to serve in this capacity.

In making the No Language Left Behind project bolivia mobile database open-source, Meta recognizes the development of MT and AI technology as a collective responsibility. Researchers are able to build on its gains instead of risking redundancy of efforts, allowing them to participate in developing the tech in a more meaningful capacity.

Toward An Ethical Approach to MT Development
According to their research paper, “NLLB could motivate more low-resource language writers or content creators to share localized knowledge or various aspects of their culture with both cultural insiders and outsiders through social media platforms or websites like Wikipedia.