AI Council Toolkit

Structured adversarial testing for AI systems to identify vulnerabilities before deployment.

What Is AI Red-Teaming?

AI red-teaming is the practice of structured adversarial testing, deliberately probing an AI system to discover failures, vulnerabilities, and harmful behaviors before users encounter them. NIST defines it as a structured testing effort to find flaws and vulnerabilities in an AI system, and Microsoft's AI Red Team has published detailed guidance on methodology.

When to Red-Team

Tier	Red-Teaming Requirement
Tier 1	Not required
Tier 2	Recommended for customer-facing generative AI
Tier 3	Required
Tier 4	Required before any exception request

Red-Team Plan Template

1. Scope

System under test. Name, version, and deployment context
Attack surface. Which interfaces and capabilities are in scope?
Out of scope. What should not be tested? (Production systems, third-party dependencies)

2. Objectives

What are you trying to find? Common objectives include:

Safety failures (harmful, dangerous, or illegal outputs)
Bias and fairness failures (discriminatory behavior)
Security vulnerabilities (prompt injection, data extraction, jailbreaks)
Reliability failures (hallucinations, inconsistencies, edge cases)
Policy violations (outputs that breach organizational guidelines)

3. Team Composition

Red teams should include:

Technical AI/ML engineers who understand the system
Security professionals experienced in adversarial testing
Domain experts who understand the real-world context
Diverse perspectives (different backgrounds, expertise, and thinking styles)

4. Methodology

Scenario-based testing. Design realistic adversarial scenarios
Structured prompts. Use prompt libraries and known attack patterns
Escalating difficulty. Start with simple attacks, increase sophistication
Document everything. Record every test, input, output, and observation

5. Findings and Remediation

For each finding, document:

Description of the vulnerability or failure
Severity (critical / high / medium / low)
Reproducibility (always / sometimes / rare)
Evidence (screenshots, logs, input/output pairs)
Recommended mitigation
Owner and target remediation date

6. Report to Council

The red-team report is submitted to the council as part of the review package. The council uses it alongside the impact assessment and security review to make an informed decision.

Copy This Template

# Red-Team Plan

## 1. Scope

- **System under test:** [Name, version, deployment context]
- **Attack surface:** [Interfaces and capabilities in scope]
- **Out of scope:** [What should not be tested]

## 2. Objectives

- [ ] Safety failures (harmful, dangerous, or illegal outputs)
- [ ] Bias and fairness failures
- [ ] Security vulnerabilities (prompt injection, data extraction, jailbreaks)
- [ ] Reliability failures (hallucinations, inconsistencies, edge cases)
- [ ] Policy violations

## 3. Team Composition

| Role | Name | Expertise |
|------|------|-----------|
| AI/ML engineer | | |
| Security professional | | |
| Domain expert | | |
| Diverse perspective | | |

## 4. Methodology

- [ ] Scenario-based testing designed
- [ ] Structured prompt library prepared
- [ ] Escalating difficulty planned
- [ ] Documentation protocol in place

## 5. Findings

| # | Description | Severity | Reproducibility | Evidence | Recommended Mitigation | Owner | Target Date |
|---|-------------|----------|----------------|----------|----------------------|-------|-------------|
| | | | | | | | |
| | | | | | | | |

## 6. Summary for Council

**Overall risk assessment:**

**Key findings:**

**Recommended actions:**

Red-Teaming

On this page