What are foundation models for AI?

Copy URL

A foundation model is a type of machine learning (ML) model that is pretrained to perform a range of tasks. 

Until recently, artificial intelligence (AI) systems were specialized tools, meaning that an ML model would be trained for a specific application or single use case. The term foundation model (also known as a base model) entered our lexicon when experts began noticing 2 trends within the field of machine learning:

  1. A small number of deep learning architectures were being used to achieve results for a wide variety of tasks.
  2. New concepts can emerge from an artificial intelligence (AI) model that were not originally intended in its training. 

Foundation models have been programmed to function with a general contextual understanding of patterns, structures, and representations. This foundational comprehension of how to communicate and identify patterns creates a baseline of knowledge that can be further modified, or fine tuned, to perform domain specific tasks for just about any industry.

 

Two defining characteristics that enable foundation models to function are transfer learning and scale. Transfer learning refers to the ability of a model to apply information about one situation to another and build upon its internal “knowledge.”

 Scale refers to hardware–specifically, graphics processing units (GPUs), that allow the model to perform multiple computations simultaneously, also known as parallel processing. GPUs are critical for training and deploying deep learning models, including foundation models, because they offer an ability to quickly process data and make complex statistical calculations.

Deep learning and foundation models
Many foundation models, especially those used in natural language processing (NLP), computer vision, and audio processing, are pretrained using deep learning techniques. Deep learning is a technology that underpins many (but not all) foundation models and has been a driving force behind many of the advancements in the field. Deep learning, also known as deep neural learning or deep neural networking, teaches computers to learn through observation, imitating the way humans gain knowledge. 

Transformers and foundation models
While not all foundation models use transformers, a transformer architecture has proven to be a popular way to build foundation models that involve text such as ChatGPT, BERT, and DALL-E 2.  Transformers enhance the capability of ML models by allowing it to capture contextual relationships and dependencies between elements in a sequence of data. Transformers are a type of artificial neural network (ANN) and are used for NLP models, however, they are typically not utilized in ML models that singularly use computer vision or speech processing models.

After a foundation model has been trained, it can rely on the knowledge gained from the huge pools of data to help solving problems–a skill that can provide valuable insights and contributions to organizations in many ways. Some of the general tasks a foundation model can perform include:

Natural language processing (NLP)
Recognizing context, grammar, and linguistic structures, a foundation model trained in NLP can generate and extract information from the data they are trained with. Further fine-tuning an NLP model by training it to associate text with sentiment (positive, negative, neutral) could prove useful for companies looking to analyze written messages such as customer feedback, online reviews, or social media posts. NLP is a broader field that encompasses the development and application of large language models (LLMs).

Computer vision
When the model can recognize basic shapes and features, it can begin to identify patterns. Further fine-tuning a computer vision model can lead to automated content moderation, facial recognition, and image classification. Models can also generate new images based on learned patterns. 

Audio/speech processing
When a model can recognize phonetic elements, it can derive meaning from our voices which can lead to more efficient and inclusive communication. Virtual assistants, multilingual support, voice commands, and features like transcription promote accessibility and productivity. 

With additional fine-tuning, organizations can design further specialized machine learning systems to address industry specific needs such as fraud detection for financial institutions, gene sequencing for healthcare, chatbots for customer service, and so much more.

Take the AI/ML assessment 

Foundation models provide accessibility and a level of sophistication within the realm of AI that many organizations do not have the resources to attain on their own. By adopting and building upon foundation models, companies can overcome common hurdles such as:

Limited access to quality data: Foundation models provide a model built on data that most organizations don’t have access to.

Model performance/accuracy: Foundation models provide a quality of accuracy as a baseline that might take months or even years of effort for an organization to build themselves. 

Time to value: Training a machine learning model can take a long time and requires many resources. Foundation models provide a baseline of pretraining that organizations can then fine tune to achieve a bespoke result. 

Limited talent: Foundation models provide a way for organizations to make use of AI/ML without having to invest heavily in data science resources. 

Expense management: Using a foundation model reduces the need for expensive hardware that is required for initial training. While there is still a cost associated with serving and fine tuning the finalized model, it is only a fraction of what it would cost to train the foundation model itself.

 

While there are many exciting applications for foundation models, there are also a number of potential challenges to be mindful of.

Cost
Foundation models require significant resources to develop, train, and deploy. The initial training phase of foundation models requires vast amounts of generic data, consumes tens of thousands of GPUs, and often requires a group of machine learning engineers and data scientists. 

Interpretability
“Black box” refers to when an AI program performs a task within its neural network and doesn’t show its work. This creates a scenario where no one–including the data scientists and engineers who created the algorithm–is able to explain exactly how the model arrived at a specific output. The lack of interpretability in black box models can create harmful consequences when used for high-stakes decision making, especially in industries like healthcare, criminal justice, or finance. This black box effect can occur with any neural-network based model, not just foundation models. 

Privacy and security 
Foundation models require access to a lot of information, and sometimes that includes customer information or proprietary business data. This is something to be especially cautious about if the model is deployed or accessed by third-party providers.

Accuracy and bias 
If a deep learning model is trained on data that is statistically biased, or doesn’t provide an accurate representation of the population, the output can be flawed. Unfortunately, existing human bias is often transferred to artificial intelligence, thus creating risk for discriminatory algorithms and bias outputs. As organizations continue to leverage AI for improved productivity and performance, it’s critical that strategies are put in place to minimize bias. This begins with inclusive design processes and a more thoughtful consideration of representative diversity within the collected data. 

When it comes to foundation models, our focus is to provide the underlying workload infrastructure–including the environment to enable training, prompt tuning, fine-tuning, and serving of these models.

A leader among hybrid and multicloud container development platforms, Red Hat® OpenShift® enables collaboration between data scientists and software developers. It accelerates the rollout of intelligent applications across hybrid cloud environments, from the datacenter to the network edge to multiple clouds.

The proven foundation of Red Hat OpenShift AI enables customers to more reliably scale to train foundation models using OpenShift’s native GPU acceleration features on-premises or via a cloud service. Organizations can access the resources to rapidly develop, train, test, and deploy containerized machine learning models without having to design and deploy Kubernetes infrastructure. 

Red Hat Ansible® Lightspeed with IBM watsonx Code Assistant is a generative AI service that helps developers create Ansible content more efficiently. It reads plain English entered by a user, and then it interacts with IBM watsonx foundation models to generate code recommendations for automation tasks that are then used to create Ansible Playbooks. Deploy Ansible Lightspeed on Red Hat Openshift to make the hard tasks in Kubernetes easier through intelligent automation and orchestration.

Keep reading

Article

What is generative AI?

Generative AI relies on deep learning models trained on large data sets to create new content.

Article

What is machine learning?

Machine learning is the technique of training a computer to find patterns, make predictions, and learn from experience without being explicitly programmed.

Article

What are foundation models?

A foundation model is a type of machine learning (ML) model that is pre-trained to perform a range of tasks. 

More about AI/ML

Products

An AI-focused portfolio that provides tools to train, tune, serve, monitor, and manage AI/ML experiments and models on Red Hat OpenShift.

An enterprise application platform with a unified set of tested services for bringing apps to market on your choice of infrastructure. 

Red Hat Ansible Lightspeed with IBM watsonx Code Assistant is a generative AI service designed by and for Ansible automators, operators, and developers. 

Resources

e-book

Top considerations for building a production-ready AI/ML environment

Analyst Material

The Total Economic Impact™ Of Red Hat Hybrid Cloud Platform For MLOps

Webinar

Getting the most out of AI with open source and Kubernetes