Security, Compliance, and Governance for AWS AI Workloads: AIF-C01 Domain 5

AIF-C01

AWS

April 07, 2026

Cover image for Security, Compliance, and Governance for AWS AI Workloads: AIF-C01 Domain 5

Engineer working in a server room with cybersecurity padlock overlay. Source: Unsplash (free to use).

Domain 5 of the AWS AI Practitioner exam (14%) asks: when you deploy AI on AWS, who is responsible for what, and how do you fulfill your side of that deal? The answer starts with the Shared Responsibility Model and branches into IAM, encryption, network isolation, AI-specific attack mitigation, compliance frameworks, data governance, and a scoping matrix for determining how much governance overhead you carry.

The Shared Responsibility Model

AWS secures the cloud infrastructure. You secure everything deployed on it. The split depends on the service type. Training on an Amazon EC2 instance means you own the OS, patching, and application security. Using Amazon SageMaker Serverless Inferencing means AWS manages the infrastructure and you configure the service. The less managed the service, the more responsibility lands on you. This applies to both security and compliance.

IAM: Access Control for AI Workloads

Users and groups: one IAM user per person, policies attached to groups by job function (developers, QA, admins) rather than individuals. Roles: provide temporary credentials that expire after the session. A role assumed by a Lambda function or CI/CD pipeline expires automatically; prefer roles over long-lived credentials for all services. Policies: follow least privilege; an explicit deny always wins over any allow.

AWS IAM Identity Center centralizes access management across multiple AWS accounts using an external identity provider. Users authenticate once and get temporary credentials for any account they're permitted to access. Recommended over managing individual IAM users at scale.

Amazon SageMaker Role Manager generates IAM roles for three ML personas: Data Scientist, MLOps, and SageMaker Compute. Pick a persona, select ML activities, and it outputs the permissions policy.

Root user: enable MFA immediately, disable root access keys, use it only for tasks that explicitly require it (billing, account closure).

Encryption: At Rest and in Transit

Amazon S3, DynamoDB, and SageMaker (ML storage volumes, notebook instances, training jobs, endpoints) encrypt by default with service-owned keys. For key control, use AWS KMS: AWS-managed keys (AWS handles rotation) or customer-managed keys (you control the policy and rotation). A KMS key creates a double layer: attacker needs both S3 access and KMS permission to read data.

All AWS service endpoints support TLS. SageMaker inter-node training traffic is not encrypted by default; enabling it increases training time on deep learning workloads.

Amazon Macie scans S3 buckets for PII using ML and pattern matching. PII should be removed at training data ingestion; Macie alerts when it finds it and can trigger automated remediation.

Network Isolation

By default, SageMaker Studio and notebook instances run in a SageMaker-managed VPC with internet access enabled. That internet access is a risk: malicious code could exfiltrate training data over the public internet. Launch SageMaker workloads in a customer-managed Amazon VPC where you control traffic through security groups, network ACLs, and firewalls.

In VPC-only mode, all direct internet access is blocked. To still reach AWS services (S3, CloudWatch, SageMaker API), use VPC interface endpoints via AWS PrivateLink, which routes traffic through the AWS private network entirely.

AI-Specific Attack Vectors

Traditional software security does not cover everything. AI systems have their own attack surface.

Data poisoning: attacker mislabels training data; the model learns to make predictable mistakes. Mitigation: scan training data quality, maintain a validation set, retrain frequently on clean data.

Adversarial inputs: subtly manipulated inference inputs cause misclassification, no training pipeline access needed. Mitigation: validate inputs before inference; train on adversarial examples.

Model inversion: repeated queries reverse-engineer training data. If outputs include confidence scores, attackers can reconstruct specific individuals from a facial recognition model. Mitigation: limit output detail; avoid unnecessary confidence scores.

Model extraction: enough input/output pairs let an attacker train a shadow model. Mitigation: restrict model access; rate-limit inference endpoints.

Prompt injection: in LLMs, malicious instructions override system behavior or extract filtered information. Mitigation: train the model to detect injection patterns; validate all user inputs.

Cross-cutting mitigations: least privilege, encryption at rest and in transit, S3 Block Public Access, production monitoring.

Monitoring and Artifact Governance

Amazon SageMaker Model Monitor detects data drift and model quality degradation in production. Enable data capture on an endpoint, create a baseline from training data, then schedule monitoring jobs that compare live traffic against it. Amazon CloudWatch is the alerting layer on top. AWS CloudTrail logs all API calls including SageMaker actions (but not endpoint invocations) and is your audit trail under the Shared Responsibility Model.

For regulatory reproducibility, every artifact must be tracked: source code in CodeCommit/GitHub, datasets in S3 (partitioned by version), container images in Amazon ECR, training job metadata via SageMaker's unique job IDs, and model versions in SageMaker Model Registry with Pending/Approved/Rejected approval status.

SageMaker Model Cards create immutable records (intended uses, risk rating, evaluation results) exportable to PDF for auditors. SageMaker ML Lineage Tracking auto-builds a graph connecting all workflow entities, queryable by dataset or artifact. SageMaker Feature Store provides centralized feature storage with point-in-time queries, critical for preventing data leakage in time-series models. SageMaker Model Dashboard aggregates monitoring, model cards, and lineage into one console.

Compliance Frameworks

AWS Artifact publishes AWS's third-party audit reports (SOC 2, ISO 27001, etc.) for download. Auditors can accept the AWS infrastructure controls as verified and focus only on your side.

Two AI-specific ISO standards from 2023: ISO 42001 (AI management system standard) and ISO 23894 (AI risk management guidance). Both are recommended, not legally required.

The EU AI Act categorizes AI into unacceptable risk (banned: social scoring, emotion inference in workplaces), high risk (strict requirements: CV-scanning tools), and minimal/no risk (largely unregulated). Most production systems fall under high risk. Even if you don't serve EU citizens, track it: EU regulations tend to become global standards.

The NIST AI RMF is voluntary with four functions: Govern, Map, Measure, Manage. Risk = likelihood x severity. Residual risk (after controls) drives the overall system rating.

AWS Compliance Tooling

AWS Audit Manager automates collection of compliance evidence and generates audit-ready reports. Built-in frameworks include SOC 2 and Generative AI best practices.

Guardrails for Amazon Bedrock enforces content policies on any Bedrock foundation model: configurable content filtering, denied topics in plain language, and PII detection on both inputs and responses. No code changes required.

AWS Config tracks configuration changes, evaluates against rules, and auto-remediates via AWS Systems Manager. Conformance packs bundle rules per use case; relevant: "Operational Best Practices for AI and ML" and "Security Best Practices for Amazon SageMaker."

Amazon Inspector scans applications and containers for vulnerabilities with prioritized findings. AWS Trusted Advisor evaluates your account against best practices across cost, performance, resilience, security, operational excellence, and service limits.

Data Governance

Data governance combines people, process, and technology to manage data availability, usability, integrity, and security. For AI, model quality depends directly on training data quality.

Three roles: data owner (executive-level, sets policies), data steward (business-side, executes day-to-day data tasks), IT (deploys and manages tooling).

AWS Glue DataBrew provides visual, no-code data profiling and lineage tracking. AWS Glue Data Catalog stores metadata centrally (populated manually or by Glue crawlers). AWS Glue Data Quality lets non-coders define quality rules with ML-assisted anomaly detection. AWS Lake Formation manages fine-grained access control at the column, row, and cell level; permissions are checked before any query via Athena, Glue, EMR, or Redshift.

For training data lifecycle cost: S3 Lifecycle rules automate transitions between storage classes. Example: Standard at day 0, Standard-IA at day 5, Glacier Deep Archive at day 120, deletion after 5 years.

The GenAI Security Scoping Matrix

The Generative AI Security Scoping Matrix defines how much governance responsibility you carry based on how you're building:

Scope 1 - Consumer App: ChatGPT, Midjourney. Requires usage guidelines and compliance monitoring.
Scope 2 - Enterprise App: Salesforce Einstein GPT. Adds data flow understanding and regulatory alignment.
Scope 3 - Pre-trained Models: Amazon Bedrock base models. Adds a governance framework and training data understanding.
Scope 4 - Fine-tuned Models: Bedrock customized models, SageMaker JumpStart. Adds access control; model inherits data classification of fine-tuning data.
Scope 5 - Self-trained Models: Amazon SageMaker. Adds full data governance; model inherits data classification of all training data.

The strategy: minimize scope. Start with fully managed AWS AI services (Amazon Comprehend, Amazon Translate). If those don't fit, use Bedrock with RAG. If domain specificity is needed, fine-tune. Only train from scratch when nothing else meets requirements.

Practical Takeaways

Security for AI workloads follows the same shared model as everything else on AWS, with AI-specific additions: attack vectors at the data layer (poisoning) and inference layer (adversarial inputs, prompt injection), plus the compliance complexity that a model inherits the data classification of whatever trained it.

The scoping matrix is the most useful concept from this domain. Determine scope before building. If you can stay at Scope 2 or 3, do it. Lower scope means less compliance overhead and faster time to ship.