G42 launch of JAIS 70B champions Arabic NLP

G42 launch of JAIS 70B champions Arabic NLP

The latest JAIS large language model (LLM), JAIS 70B, was released today by Inception, a G42 company specializing in the development of advanced AI models and applications, all provided as a service. A 70 billion parameter model, JAIS 70B is built for developers of Arabic-based natural-language processing (NLP) solutions and promises to accelerate the integration of Generative AI services across various industries, enhancing capabilities in areas such as customer service, content creation, and data analysis.

JAIS 70B delivers Arabic-English bilingual capabilities at an unprecedented size and scale for the open-source community. As a 70 billion parameter model, it has increased ability to handle complicated and nuanced tasks, as well as better capability to process complex datasets. JAIS 70B was developed using continuous training, a process of fine-tuning a pre-trained model, on 370 billion tokens of which 330 billion were Arabic tokens, the largest Arabic dataset ever used to train an open-source foundational model.

This suite of JAIS models accommodates a wide range of use cases, and aims to accelerate innovation, development, and research opportunities for multiple downstream applications for the Arabic speaking and bilingual community.

Dr. Andrew Jackson, CEO, Inception said: “AI is now a proven value-adding force, and large language models have been at the forefront of the AI adoption spike. JAIS was created to preserve Arabic heritage, culture, and language, and to democratize access to AI. Releasing JAIS 70B and this new family of models reinforces our commitment to delivering the highest quality AI foundation model for Arabic speaking nations. The training and adaptation techniques we are delivering successfully for Arabic models are extensible to other under-served languages and we are excited to be bringing this expertise to other countries.”

JAIS 70B retains, and in specific cases, exceeds, the high-quality English-language processing capabilities of Llama2, while vastly excelling on Arabic outputs versus the base model. The JAIS development team trained an expanded tokenizer based on the Llama2 tokenizer to enhance Arabic text processing efficiency, doubling the model’s base vocabulary. According to Sengupta, the model “splits Arabic words less aggressively and makes training and inferencing cheaper” than the standard Llama2 model.

Neha Sengupta, Principal Applied Scientist, Inception said: “For models up to 30 billion parameters, we successfully trained JAIS from scratch consistently outperforming adapted models in the community. However, for models with 70 billion parameters and above, the computational complexity and environmental impact of training from scratch were significant. We made a choice to build JAIS 70B on the Llama2 model, allowing us to leverage the extensive knowledge base of an existing English model and develop a more efficient and sustainable solution.”

Users can download the JAIS models and access the technical paper and benchmarking results by visiting the dedicated page on Hugging Face: https://huggingface.co/inceptionai/jais-adapted-70b.

 

AI driving innovation, growth, and development

AI driving innovation, growth, and development

Patrick Johansson, President of Ericsson MEA, highlights how AI is…
Nutanix Offers Seamless Transition and Smart Cloud Solutions for Enterprises

Nutanix Offers Seamless Transition and Smart Cloud Solutions for Enterprises

Fouad EL Akkad, Technical Solutions Architect at Nutanix, highlights that Nutanix offers a smart cloud strategy…
How to Reduce the Complexity of Authentication Requirements

How to Reduce the Complexity of Authentication Requirements

Roman Cuprik, Content Writer at ESET, explains that authentication mechanisms…
Clemta ready to cater entrepreneurs in the region

Clemta ready to cater entrepreneurs in the region

Clemta, the one-stop shop for global entrepreneurs incorporating in the US, has…
Microsoft reveals Top Three teams for Imagine Cup!

Microsoft reveals Top Three teams for Imagine Cup!

Today marks a pivotal moment in the 2024 Imagine Cup as Microsoft reveal…
OPPO collaborates with startups for tech advancements

OPPO collaborates with startups for tech advancements

Today, with 150 million startups worldwide and another 50…