LLM’s: Dawn Of the Falcon

THE NEW GENERATION OF OPEN LANGUAGE MODELS

In a bold and promising move for the Artificial Intelligence community, the Institute for Technology Innovation in Abu Dhabi has launched Falcon, a new family of next-generation language models that promises to open new frontiers in conversational and word-processing artificial intelligence. natural language.

WHAT IS FALCON?

Falcon is a language model suite consisting of two base models: Falcon-40B and Falcon-7B. Falcon-40B, with 40 billion parameters, currently tops the Open Machine Learning Language Models leaderboard charts, while the smaller but equally powerful Falcon-7B model is best in class of weight.

The launch of the Falcon is particularly significant because it is the first model of this size and capability that is “truly open.” This means that Falcon can be used for a variety of commercial and research applications, from chatbots to text generators, without the licensing restrictions that often accompany other large-scale language models.

TECHNICAL INNOVATIONS

Falcon is not only notable for its size and aperture, but also for various technical innovations. One of these innovations is multi-query attention, a modification of multi-head attention that shares a key and value across all heads. This trick doesn’t significantly influence pretraining, but it greatly improves inference scalability, lowering memory costs, and enabling new optimizations like the ability to maintain state.

Falcon models are also distinguished by the quality and quantity of their training data. Falcon-7B and Falcon-40B have been trained on 1.5 trillion and 1 trillion tokens respectively, with the majority of their training data coming from RefinedWeb, a massive CommonCrawl-based web dataset. This strategy represents a shift in focus from collecting dispersed curated sources and puts the focus on scaling and improving the quality of web data.

FALCON AND EFFICIENT MACHINE LEARNING

In addition to the base versions of the Falcon-7B and Falcon-40B, the Institute for Technology Innovation has also made available instructive versions of the models, the Falcon-7B-Instruct and Falcon-40B-Instruct. These experimental variants have been fine-tuned on instructions and conversational data, making them ideal for wizard-style tasks.

For practitioners interested in fine-tuning these models for specific tasks, Hugging Face has recently released its PEFT (Parameter-Efficient Fine-Tuning) library, which allows you to efficiently tune these pre-trained language models for various applications without fine-tuning all model parameters. . This significantly reduces computational and storage costs, making it easy to tailor these models to specific tasks even

THE DAWN OF THE FALCON LANGUAGE MODEL: A NEW PIONEER IN ARTIFICIAL INTELLIGENCE

The world of Artificial Intelligence has witnessed an exciting milestone: the release of the new Falcon language model. Developed by the Institute for Technological Innovation in Abu Dhabi, Falcon has burst onto the AI scene with an unprecedented level of capability and accessibility.

Falcon, which is offered under the Apache 2.0 license, has been hailed as the first “truly open” model, rivaling many existing closed source models. This milestone is exciting news for AI professionals, enthusiasts, and the industry as it opens the doors to a host of exciting new applications.

The Falcon family consists of two base models: the Falcon-40B and its “little brother”, the Falcon-7B. Falcon-40B currently tops the Open LLM Leaderboard charts, while the Falcon-7B model is the best in its weight class. Despite its impressive capacity, Falcon-40B requires around 90GB of GPU memory, much less than its competitor LLaMA-65B, which it outperforms. On the other hand, Falcon-7B only needs around 15GB, which allows it to be accessible even on consumer hardware.

The Institute for Technological Innovation has also made available instructive versions of the models, the Falcon-7B-Instruct and the Falcon-40B-Instruct. These experimental variants have been fine-tuned based on instructions and conversational data, making them especially well-suited for popular wizard-style tasks. These models are the ideal choice for those who want to quickly experiment with the capabilities of Falcon.

LET’S TAKE A LOOK TO THE TECHNICAL SIDE

The Falcon-7B and Falcon-40B models have been trained on 1.5 trillion and 1 trillion tokens respectively, consistent with modern models that optimize for inference. The quality of the Falcon models is due to their training data, mostly based (>80%) on RefinedWeb, a massive new CommonCrawl-based web dataset. Unlike other models that collect dispersed curated sources, the Institute for Technology Innovation has focused on scaling and improving the quality of web data, using large-scale deduplication and filtering.

Performance:

Falcon-40B tops the Open LLM Leaderboard, suggesting it has superior performance to other models, including GPT-3. However, without specific benchmark results comparing GPT-3 and Falcon-40B, it’s hard to say definitively which is superior in all tasks.
Falcon-40B and Falcon-7B have been trained on large amounts of data (1.5 trillion and 1 trillion tokens respectively), which is likely to contribute to their high performance. The training data for these models is predominantly based (>80%) on RefinedWeb, a large web dataset based on CommonCrawl. Other data sources, such as conversational data from Reddit, have also been included, albeit in smaller amounts.

Memory and Compute Requirements:

Falcon-40B requires about 90GB of GPU memory, which is less than LLaMA-65B, another large language model that Falcon outperforms. Falcon-7B only needs about 15GB, making it more accessible for inference and fine tuning on consumer hardware.

Licensing:

Both GPT-3 and Falcon-40B can be used for commercial purposes. However, the licensing terms may differ, so it’s important to review these carefully. Falcon-40B is released under the Apache 2.0 license.

Fine-Tuning and Adaptation:

GPT-3 and Falcon-40B can both be fine-tuned for specific tasks. However, recent developments in Parameter-Efficient Fine-Tuning (PEFT) methods, which adapt pre-trained language models to various downstream applications without fine-tuning all the model’s parameters, may make a difference in the efficiency of this process. As of the last available data, it seems PEFT support for Falcon-40B is not explicitly mentioned.

CONCLUSION

In the ceaseless quest for innovation in the realm of Language Learning Models (LLMs), the Falcon models – Falcon-40B and Falcon-7B – have made a groundbreaking entry. Developed by the Technology Innovation Institute in Abu Dhabi, these models have set new benchmarks, topping the charts of the Open LLM Leaderboard with remarkable scores.

What truly sets the Falcon family apart is their “truly open” model, a first of its kind, which rivals the capabilities of many current closed-source models. This monumental leap forward opens the door to a plethora of exciting use cases, making advanced language modeling more accessible to practitioners, enthusiasts, and the industry at large.

Notably, the Falcon models have been trained on an impressive volume of data, predominantly (>80%) based on RefinedWeb, a massive web dataset founded on CommonCrawl. This shift towards scaling and improving the quality of web data, rather than gathering scattered curated sources, sets a new standard in the field of LLMs.

Moreover, these models employ multiquery attention, an innovative feature that significantly improves the scalability of inference and reduces memory costs, a critical factor in enabling novel optimizations such as statefulness.

Falcon’s ability to offer remarkable capabilities while requiring less GPU memory compared to its competitors reflects a clear commitment towards efficiency and accessibility. The Falcon-40B and Falcon-7B models necessitate ~90GB and ~15GB of GPU memory respectively, which is notably less than other state-of-the-art models such as LLaMA-65B.

Moreover, TII has introduced ‘instruct’ versions of the Falcon models – Falcon-7B-Instruct and Falcon-40B-Instruct, which have been finetuned on instructions and conversational data. This makes them especially suited for popular assistant-style tasks and offers a ready-to-use solution for those looking to quickly experiment with these models.

In conclusion, the advent of the Falcon models marks a significant milestone in the evolution of Language Learning Models, reshaping the landscape with their open-source approach, impressive capabilities, and novel training methods. The Falcon models stand as a testament to the rapid advancements in AI and Machine Learning, promising a future where the power of language models is even more accessible and impactful.

RESOURCES AND BIBLIOGRAPHY

Hugging Face’s Falcon: An open-source Large Language Model.
Hugging Face’s Parameter-Efficient Fine-Tuning (PEFT).
The GitHub repository for 🤗 PEFT.
The OpenAI GPT-3 Model Card.
The Hugging Face Falcon-40B Model Card.

About the author: Gino Volpi is the CEO and co-founder of BELLA Twin, a leading innovator in the insurance technology sector. With over 29 years of experience in software engineering and a strong background in artificial intelligence, Gino is not only a visionary in his field but also an active angel investor. He has successfully launched and exited multiple startups, notably enhancing AI applications in insurance. Gino holds an MBA from Universidad Técnica Federico Santa Maria and actively shares his insurtech expertise on IG @insurtechmaker. His leadership and contributions are pivotal in driving forward the adoption of AI technologies in the insurance industry.