AWS Certified AI Practitioner Exam Topics

Plain-English explanations of every major AI/ML concept and AWS service tested in the AWS Certified AI Practitioner (AIF-C01) exam.

Domain 1: Fundamentals of AI and ML

This domain tests your understanding of what AI and ML are, how they work, and when they should be used. You need to know the different types of learning, common algorithms, and the overall lifecycle of an ML project.

Artificial Intelligence (AI) vs. Machine Learning (ML) vs. Deep Learning

Artificial Intelligence is the broadest category — it refers to any technology that enables machines to perform tasks that would normally require human intelligence, such as reasoning, perception, or decision-making.

Machine Learning is a subset of AI. Instead of being explicitly programmed with rules, ML models learn patterns from data and improve their performance over time.

Deep Learning is a subset of ML that uses artificial neural networks with many layers (hence "deep"). Deep learning powers most modern AI breakthroughs, including image recognition, natural language processing, and generative AI.

High Frequency

Types of Machine Learning

Supervised Learning: The model is trained on labelled data — inputs paired with correct outputs. The model learns to map inputs to outputs. Examples include email spam classification and house price prediction.

Unsupervised Learning: The model is given unlabelled data and must discover patterns or groupings on its own. Common use cases include customer segmentation and anomaly detection.

Reinforcement Learning: The model learns through trial and error by receiving rewards or penalties based on its actions in an environment. Used in robotics, game-playing AI, and autonomous systems.

Semi-supervised Learning: A combination of labelled and unlabelled data — useful when labelling all data is expensive or time-consuming.

The ML Lifecycle

A machine learning project typically follows these stages: problem framing (defining what you want to predict or decide), data collection and preparation (gathering, cleaning, and labelling data), feature engineering (selecting and transforming input variables), model training (fitting the model to data), model evaluation (measuring performance with metrics), deployment (making the model available for predictions), and monitoring (tracking performance in production).

Amazon SageMaker supports all of these stages.

Key ML Concepts: Bias, Variance, and Overfitting

Overfitting occurs when a model learns the training data too well — including its noise — and performs poorly on new, unseen data. Think of it as memorising rather than learning.

Underfitting occurs when a model is too simple to capture the patterns in the data, resulting in poor performance on both training and test data.

Bias refers to systematic errors in a model's predictions — it consistently misses in one direction.

Variance refers to how sensitive a model is to small fluctuations in training data. High variance models overfit; high bias models underfit.

High Frequency

Model Evaluation Metrics

Accuracy: The percentage of correct predictions. Misleading when the dataset is imbalanced (e.g., 99% of emails are not spam).

Precision: Of all the items the model flagged as positive, what proportion were actually positive? High precision = fewer false alarms.

Recall (Sensitivity): Of all the actual positives, what proportion did the model correctly identify? High recall = fewer missed cases.

F1 Score: The harmonic mean of precision and recall. Useful when you need to balance both.

RMSE (Root Mean Squared Error): Used for regression tasks — measures how far predictions are from actual values.

AUC-ROC: Measures a model's ability to discriminate between classes across different classification thresholds.

Domain 2: Fundamentals of Generative AI

This domain covers the concepts behind generative AI — models that can create new content including text, images, code, and audio. Understanding how these models work at a conceptual level is essential.

What Is Generative AI?

Generative AI refers to AI models that can produce new content — text, images, audio, video, code — rather than simply classifying or predicting from existing data. Unlike traditional ML models that output a label or number, generative models create something novel based on what they have learned from training data.

Common use cases include drafting emails and reports, generating code, answering customer questions, summarising documents, creating images, and building conversational chatbots.

High Frequency

Large Language Models (LLMs)

A Large Language Model is a type of generative AI model trained on vast amounts of text data. LLMs learn statistical patterns in language and can generate coherent, contextually relevant text in response to a prompt.

LLMs are the foundation of tools like chatbots, code assistants, and document summarisers. On AWS, LLMs are accessible through Amazon Bedrock, which allows you to use models from providers like Anthropic, Meta, Mistral, Cohere, and Amazon's own Titan models without managing any infrastructure.

High Frequency

Transformer Architecture

The Transformer is the neural network architecture that underpins virtually all modern LLMs. Introduced in 2017, it uses a mechanism called self-attention to weigh the relevance of different words in a sequence to one another — allowing the model to understand context across long passages of text.

Unlike earlier recurrent neural networks (RNNs), transformers can process all words in a sequence simultaneously, making them far more efficient to train at scale. The "T" in ChatGPT and GPT-4 stands for Transformer.

Tokens and Embeddings

Tokens are the basic units that LLMs process. A token is roughly equivalent to a word, but common words may be a single token while rare words might be split into multiple tokens. Understanding tokens matters because LLM pricing and context window limits are typically expressed in tokens.

Embeddings are numerical vector representations of text (or other data). They allow the model to capture semantic similarity — so "cat" and "feline" would have embeddings that are mathematically close to each other. Embeddings are fundamental to search, recommendation systems, and Retrieval Augmented Generation (RAG).

High Frequency

Foundation Models

A foundation model is a large AI model trained on broad data at scale that can be adapted to a wide variety of tasks. Rather than training a separate model for every use case, organisations can use a single foundation model as a starting point and customise it through fine-tuning or prompt engineering.

On AWS, Amazon Bedrock is the service that provides access to foundation models from multiple providers via a unified API. This removes the need to manage GPUs, training infrastructure, or model hosting.

Context Window and Hallucinations

The context window is the maximum amount of text (measured in tokens) that an LLM can consider at once — including both the input prompt and the output it generates. Larger context windows allow for longer conversations and documents but require more compute.

Hallucinations are a key limitation of LLMs — the model generates confident-sounding but factually incorrect information. This happens because LLMs predict the most statistically likely next token, not necessarily the factually correct one. Mitigation strategies include RAG (grounding the model in verified data) and human review workflows.

Domain 3: Applications of Foundation Models

This is the most heavily tested domain. It covers how foundation models are actually used in practice — the techniques for improving their outputs and the AWS services that enable these applications.

High Frequency

Prompt Engineering

Prompt engineering is the practice of crafting inputs to a language model to produce better outputs. It is one of the primary ways to improve model responses without retraining.

Zero-shot prompting: Asking the model to perform a task with no examples provided. Works well for simple, well-understood tasks.

Few-shot prompting: Providing a small number of examples in the prompt to guide the model's response format and style. Significantly improves performance on specialised tasks.

Chain-of-thought prompting: Asking the model to reason step-by-step before giving a final answer. Improves accuracy on complex reasoning and maths problems.

System prompts: Instructions given to the model at the start of a session that define its role, tone, or constraints — for example, "You are a helpful customer support agent for a banking app."

High Frequency

Retrieval Augmented Generation (RAG)

RAG is a technique that improves LLM accuracy by grounding the model in external, verified knowledge at inference time. Instead of relying solely on what the model learned during training, RAG retrieves relevant documents from a knowledge base and includes them in the prompt context.

The workflow is: (1) convert knowledge base documents into embeddings and store them in a vector database, (2) when a user asks a question, convert it to an embedding and search for similar documents, (3) include the retrieved documents in the LLM prompt, (4) the LLM generates an answer grounded in that retrieved context.

RAG reduces hallucinations, keeps responses up to date without retraining, and allows organisations to use private internal knowledge securely. On AWS, Amazon Bedrock Knowledge Bases automates the RAG pipeline.

Fine-Tuning vs. RAG

Fine-tuning involves further training a pre-trained foundation model on a custom dataset so that it learns new knowledge or adopts a specific style or behaviour. It modifies the model's weights permanently. Fine-tuning is best when you need the model to have deep knowledge of a specialised domain (e.g., medical terminology), behave in a consistent tone, or perform a very specific task.

RAG does not change the model at all — it changes what information is given to the model at inference time. RAG is better when you need up-to-date information, when the knowledge base changes frequently, or when you cannot afford the cost of fine-tuning.

On the exam, if a scenario involves current or frequently updated data, the answer is usually RAG. If the scenario involves learning a new style or specialised vocabulary, fine-tuning may be more appropriate.

High Frequency

Amazon Bedrock

Amazon Bedrock is AWS's fully managed service for accessing and customising foundation models through a single API. It supports models from Anthropic (Claude), Meta (Llama), Cohere, Mistral, Stability AI, and Amazon's own Titan models.

Key features include: model inference (generating responses), fine-tuning (customising models on your data), Bedrock Knowledge Bases (managed RAG), Bedrock Agents (AI agents that can take actions), and Bedrock Guardrails (applying content filters and safety controls).

All data processed through Bedrock stays within your AWS environment and is not used to train the underlying models, which is important for data privacy and compliance.

AI Agents

An AI agent is an LLM-powered system that can take multi-step actions to accomplish a goal — not just generate a single response. Agents can call tools, APIs, or databases, make decisions about what to do next, and loop through reasoning steps until a task is complete.

Amazon Bedrock Agents allows you to build agents that can interact with your systems (e.g., query a database, submit a form, check an order status) in response to natural language instructions from users.

Generative AI Evaluation Metrics

BLEU (Bilingual Evaluation Understudy): Measures how similar generated text is to reference text. Commonly used for translation quality.

ROUGE: Measures recall — how much of the reference content is captured in the generated output. Commonly used for summarisation tasks.

BERTScore: Uses contextual embeddings to measure semantic similarity between generated and reference text, going beyond word overlap.

Human Evaluation: For many use cases, especially open-ended generation, human judges remain the gold standard for evaluating quality, relevance, and safety.

Domain 4: Guidelines for Responsible AI

This domain covers the ethical, social, and legal considerations involved in building and deploying AI systems. It is increasingly important as AI becomes more prevalent in consequential decisions.

High Frequency

Bias and Fairness in AI

AI models can learn and amplify biases present in their training data. For example, if historical hiring data reflects gender bias, a model trained on that data may learn to discriminate against certain groups.

Types of bias include data bias (training data that is unrepresentative), label bias (human annotators applying inconsistent or biased labels), and model bias (the algorithm itself favouring certain patterns). Amazon SageMaker Clarify is the primary AWS tool for detecting and measuring bias in ML models and datasets.

Explainability and Transparency

Explainability refers to the ability to understand and explain why an AI model made a particular decision. This is critical in regulated industries like healthcare, finance, and lending where decisions must be justifiable.

Some models (like decision trees) are inherently explainable. Others, like deep neural networks, are "black boxes" — their internal workings are difficult to interpret. Amazon SageMaker Clarify provides explainability features that identify which input features most influenced a prediction.

Human Oversight and Amazon A2I

Human oversight involves keeping humans in the loop for AI decisions, particularly for high-stakes or low-confidence predictions. Amazon Augmented AI (Amazon A2I) is an AWS service that makes it easy to route machine learning predictions to human reviewers when confidence is below a threshold or when compliance requires human review.

On the exam, if a scenario involves ensuring that edge cases are reviewed by humans before an AI decision is actioned, Amazon A2I is likely the correct service.

Responsible AI Principles

AWS's approach to responsible AI includes several key principles: fairness (ensuring equitable outcomes across groups), explainability (making AI decisions understandable), privacy and security (protecting data used in AI systems), safety (preventing harm from AI outputs), controllability (maintaining human oversight), and veracity and robustness (ensuring models perform reliably).

Amazon Bedrock Guardrails allows developers to apply content filters, block harmful topics, and enforce responsible AI policies at the application level.

Domain 5: Security, Compliance, and Governance for AI Solutions

This domain covers the AWS services and practices needed to secure AI systems, maintain compliance, and implement governance frameworks.

High Frequency

Data Security for AI

Securing data used in AI systems involves controlling who has access to training data and model outputs, encrypting data at rest and in transit, and ensuring sensitive data is not inadvertently exposed in model outputs or logs.

Key AWS tools include AWS IAM for access control, AWS KMS for encryption key management, Amazon VPC and AWS PrivateLink for network isolation, and Amazon S3 with appropriate bucket policies for secure data storage.

The AWS Shared Responsibility Model

In the context of AI, the shared responsibility model defines what AWS secures versus what the customer must secure. AWS is responsible for securing the underlying infrastructure — hardware, networking, and managed service operations. The customer is responsible for securing their data, configuring access controls, managing model outputs, and ensuring compliance with regulations applicable to their use case.

AI Governance Services

AWS Config: Monitors and records the configuration of AWS resources over time, enabling compliance auditing and change tracking.

AWS CloudTrail: Logs all API calls made in your AWS account, providing a full audit trail for AI service usage.

AWS Audit Manager: Automates evidence collection for compliance audits, mapping AWS activity to regulatory frameworks.

Amazon Inspector: Automated vulnerability scanning for EC2 instances and container images, relevant when AI workloads are deployed on compute resources.

AWS Artifact: Provides access to AWS compliance documentation and agreements.

SageMaker Model Cards: Documenting a model's intended purpose, performance characteristics, and lineage — supporting transparency and governance requirements.

Key AWS AI Services at a Glance

These are the AWS services most commonly tested in the AIF-C01 exam. You should know what each one does and when to use it.

Amazon Bedrock

Fully managed access to foundation models from multiple providers. Use for GenAI applications, RAG, agents, and fine-tuning — without managing infrastructure.

Amazon SageMaker

End-to-end ML platform for building, training, and deploying custom ML models. Covers the entire ML lifecycle from data labelling to model monitoring.

Amazon Rekognition

Pre-built computer vision service. Use for image and video analysis — facial recognition, object detection, content moderation.

Amazon Comprehend

Natural language processing service. Extracts entities, sentiment, key phrases, and language from text without building a custom model.

Amazon Transcribe

Converts speech to text (audio and video files). Supports multiple languages, custom vocabulary, and speaker diarisation.

Amazon Polly

Converts text to lifelike speech. Used for voice interfaces, accessibility features, and automated narration.

Amazon Translate

Neural machine translation service. Translates text across dozens of languages in real time or in batch.

Amazon Lex

Build conversational chatbots and voice interfaces powered by the same technology as Alexa. Integrates with Lambda for business logic.

Amazon Kendra

Intelligent enterprise search powered by ML. Searches across documents, databases, and content repositories using natural language queries.

Amazon Personalize

Build real-time personalisation and recommendation systems — product recommendations, content recommendations, or re-ranking search results.

Amazon SageMaker Clarify

Detects bias in datasets and ML models. Provides explainability reports showing which features most influence model predictions.

Amazon Augmented AI (A2I)

Routes ML predictions to human reviewers when confidence is low or when regulations require human oversight. Integrates with SageMaker and Textract.

Exam tip:For questions about choosing the right AWS service, start by identifying the core task: is it text, speech, images, translation, or recommendation? Then match the task to the appropriate pre-built AI service. If the scenario requires a custom model, Amazon SageMaker is almost always involved.