Key terms in AI: The glossary for Artificial Intelligence

The world of AI is still relatively young, and the technical vocabulary of the field can seem a bit daunting. However, with a few key AI terms, you can understand and utilize the basic vocabulary. In our AI glossary, we provide an overview of the most important terms, tools, and methods.

a robot toy sitting next to a staple of books

In this glossary, we do not proceed alphabetically but start with the fundamental terms and build on them step by step.

Artificial Intelligence (AI)

Artificial intelligence refers to systems or machines that exhibit human-like abilities such as learning, understanding, reasoning, and problem-solving.

AI now has a wide range of applications. For example, it can automate tasks that humans used to perform. Typically, AI has learned to behave in a human-like manner based on human-made training data. This behavior is based on mathematical probabilities. Therefore, AI does not have a human-like understanding of itself or its tasks.

Learn more about the basics of artificial intelligence

Augmented Intelligence

Augmented Intelligence — also known as enhanced intelligence — is a subfield of artificial intelligence.

Unlike AI, augmented intelligence keeps humans as the decisive factor in the process: While an augmented intelligence platform can suggest content and procedures, humans ultimately select and decide.

Thus, augmented intelligence is not a replacement for human activities but rather a support that leaves decision-making power with humans.

An example is an AI-based knowledge database from which employees can query company knowledge in natural language. Alexa and Siri are also prominent examples of augmented intelligence.

Artificial General Intelligence (AGI)

Artificial general intelligence is a level of artificial intelligence that can understand, learn, and apply knowledge across a broad spectrum of tasks, similar to a human.

Unlike narrow AI, which is designed for specific tasks, AGI can adapt to new problems and situations without being pre-programmed for them. This type of AI can reason, solve problems, and think abstractly in various areas and would essentially have the ability to perform any intellectual task that a human can.

Machine Learning (ML)

Machine learning is the subfield of computer science that deals with the automatic generation of models. The goal is to develop machine learners that can learn from examples to perform tasks. There are many different types of machine learners and models, such as deep learning.

Machine learning is a crucial cornerstone for the development of artificial intelligence, as it has solved and continues to solve problems for which no human can formulate an algorithm.

Machine learning can generalize and process previously unknown inputs. Machine learning can be improved with more training data.

Various approaches in machine learning include:

Algorithm

An algorithm is a precise, step-by-step set of instructions for solving a problem or performing a task.

In programming, an algorithm is implemented using a programming language to process data, make decisions, and execute automated processes. Typically, algorithms are devised by humans.

Deep Learning

Deep learning is a subfield of machine learning that involves neural networks with many layers. These networks learn from large amounts of data, recognize complex patterns, and make decisions.

Deep learning enables advanced AI applications such as image and speech recognition.

AI Model

A model is always a simplified representation of reality.

In the context of machine learning and AI, a model is a computer-generated program.

A machine learner (or just learner) is a human-created algorithm that automatically generates a program from example inputs and expected outputs that behaves according to these examples.

A model learns patterns and relationships within the data and can make predictions and decisions or recognize trends when new data is available.

The important difference from an algorithm is that a model is not directly created by humans but is indirectly defined by the structure of the machine learner and the available training data.

In our overview of AI models, we compare the most well-known AI models and providers.

Finetuning

Finetuning refers to the adjustment of a pre-trained AI model to a specific task or dataset. We can think of it as targeted further training for a model, where we inform it precisely about a particular topic. This process allows the AI model to learn and perform even more effectively in the future without having to be trained from scratch. Finetuning is particularly used in machine learning models of LLMs.

Example: Suppose the AI model is a person, specifically a marketing employee. The employee undergoes comprehensive data protection training. As a result, they can independently and continuously optimize resources for data protection in the future without needing a data protection officer for every detail. They remain a marketing employee but with intensive training in data protection. Following and learning new developments and practices in data protection is now easier for them.

Agents

AI agents are AI-based IT systems that can analyze data, make decisions, and perform tasks based on the information available to them.

For example, an agent can autonomously search for and order a birthday gift for someone’s sister based on the instruction “Order a birthday gift for my sister.”

Robotics

Robotics deals with the development and deployment of robots to perform tasks. It combines engineering and computer science, enabling robots to assist humans, increase efficiency, or explore hard-to-reach places.

The control of robots can be done using trained models and/or predefined algorithms.

Generative Pre-trained Transformer (GPT)

GPT (Generative Pre-trained Transformer) is a family of advanced AI models designed to understand and generate human-like text. It learns from a large amount of text data and can respond to prompts, answer questions, and create content that mimics human writing style. GPT can be used for various tasks, including translation, content creation, and conversation.

The individual terms mean:

Enterprise GPT provides advanced conversational AI capabilities with enterprise-grade security, enabling employees to interact with AI for data analysis, content generation, and productivity tasks in a chat-based format.

Enterprise GPT vs. Enterprise Search

Enterprise search focuses on quickly retrieving relevant and secure internal data from multiple company sources using AI-enhanced indexing and natural language processing. Enterprise GPT provides advanced conversational AI capabilities with enterprise-grade security, enabling employees to interact with AI for data analysis, content generation, and productivity tasks in a chat-based format. While Enterprise Search specializes in structured and unstructured data retrieval across platforms, Enterprise GPT emphasizes interactive AI assistance with strong customization and security for business workflows.

Generative AI (GenAI)

Generative AI (GenAI) is an advanced form of artificial intelligence that not only analyzes existing data but is also capable of creating new content. It is trained to learn patterns and relationships from large datasets.

GenAI can then autonomously generate things like texts, images, music, or even videos.

Often, it translates from one modality (e.g., a text with a description in the prompt) to another (e.g., the generated image itself).

GenAI uses models trained through machine learning to make predictions and deliver creative results. Examples include ChatGPT, which writes human-like texts, or DALL·E, which creates images from text descriptions.

Large Language Model (LLM)

LLM stands for “Large Language Model.” An LLM is an AI model that can process and generate human language. It can be used for tasks such as text creation, summarization, and translation.

GPT is the base architecture of an LLM.

This architecture is used in LLMs from various manufacturers. Since the name GPT was coined by OpenAI, LLMs from OpenAI often have GPT in their names.

Hallucinations of LLMs

 LLMs continue texts based on probabilities.

These probabilities are learned from the training data. The most likely next word is not necessarily factually correct, even if the formulation sounds convincing. These false or inaccurate pieces of information are called “hallucinations.”

Depending on the AI application, false information can be more or less problematic. Ensuring the correctness of AI-generated content is a significant challenge.

Retrieval Augmented Generation (RAG)

RAG (Retrieval Augmented Generation) is a technique that uses specific information to obtain more relevant and accurate answers from an LLM.

The functionality of RAGs is particularly important in suppressing AI hallucinations.

Chunking (Chunk)

LLMs (and humans) have limited attention spans. A chunk is a prepared piece of information that has been divided into manageable units for later processing. Chunking refers to the process of dividing data (text, speech, etc.) into smaller, manageable units — called chunks.

In natural language processing, chunking is often used to improve processing efficiency and better extract specific linguistic or semantic information from the data.

AI Methods

AI methods are various types of tasks that AI models can perform. They range from classification and regression, used to categorize data and make predictions, to anomaly detection and object recognition. They also include media generation and autonomous agents, driving innovations in content creation and decision-making.

Examples of AI methods include:

Data Modalities and Data Sources in the Context of AI and ML

Modality is the technical term for different types of data that an AI system can process. They can be roughly compared to the different senses of animals. The following modalities are distinguished:

When these modalities are combined, it is referred to as multimodality.

Data sources that can contain data in various modalities include:

Prompt / Prompting

A “prompt” or “prompting” is the input given to an LLM or GPT model to generate a text continuation.

A prompt can be a question, statement, or instruction that tells the model what to do.

This guides the AI to use its training to produce relevant results. A prompt can be, for example: “Hey chatbot, give me a list of all odd numbers from 0 to 100.”

A prompt can be divided into priming (providing context information), prompting (the task or question itself), and tuning (refining by asking follow-up questions and optimizing the initial output).

Natural Language Processing (NLP)

NLP, or natural language processing, is a field of AI that enables computers to understand, interpret, and generate human language.

It combines computational linguistics — the rule-based modeling of human language — with statistical, machine learning, and deep learning models. This allows machines to process and analyze large amounts of natural language data, facilitating tasks such as translation, sentiment analysis, and speech recognition.

Text Mining

Text mining involves deriving meaningful insights from unstructured text data using computer-based algorithms and statistical methods. Historically, it emerged in the 1990s and early 2000s to process large amounts of text. Raw data was converted into structured information that could be analyzed and used, such as tracking brand mentions.

Text mining overlaps significantly with concepts like natural language processing (NLP) and machine learning. These techniques help understand and interpret human language, enabling applications such as sentiment analysis, topic modeling, and automatic summarization.

The method is somewhat dated but still shows its relevance in some modern applications.

Common Crawl

Common Crawl is a non-profit initiative that crawls the web to generate extensive archives of web pages, metadata, and links and offers them for free.

Common Crawl is used by researchers, data scientists, entrepreneurs, web developers, and non-profit organizations for web analysis, machine learning, market research, and monitoring digital rights.

The free, extensive datasets support innovation and research in various fields by providing insights into internet trends, language development, and societal changes.

(Semantic) Embedding

An embedding is a representation of data where elements such as words, images, entire sentences or paragraphs, or other information units are mapped so that their similarity can be mathematically calculated. This technique captures complex properties so that they can be processed by machine learning models.

In semantic embedding, words or phrases are mapped to reflect the semantic relationships between entities, such as similarity in meaning or context, facilitating tasks like understanding synonyms, context, and sentiment in text data.

An example of semantic embedding is when the words “doctor” and “physician” are close together in a numerical space because they have similar meanings.

Using semantic embeddings, free text is translated into vectors that are machine-readable. Free text can now potentially be used for automated processing.

Vector Database

A vector database is a database that can store vectors as a data type and perform searches for “other vectors near a search vector” very quickly.

When searching for embeddings, one looks for embeddings “near” what one is searching for. Vector databases are therefore particularly suitable for searches using semantic embeddings.

AI Watermarking

AI watermarking embeds a mark in AI-generated content such as text or images to prove its origin and protect ownership. This hidden watermark helps prevent misuse, as it is difficult to remove without damaging the content.


EU AI Act

The European Union’s Artificial Intelligence Act is a groundbreaking regulation that aims to ensure the ethical development and use of AI in member states, focusing on safety, transparency, and the protection of individual rights. It categorizes AI systems by risk levels and imposes strict requirements on high-risk applications to promote innovation within a framework of ethical standards.

Webinar recording [German]

If you want to learn more, check out the recording of our webinar with AI experts from Aleph Alpha and IT law experts from DORDA.

Data Act

The Data Act is a law within the European data strategy and complements the Data Governance Act. The Data Act gives individuals and companies the right to access the data generated by the use of smart objects, machines, and devices. It has been in force since January 11, 2024.