🔄

Agent SDK & The Agent Loop

The agent loop is the heartbeat of every Claude-powered agent. It orchestrates the conversation between Claude, your tools, and the user — iterating until Claude signals it is done via stop_reason = "end_turn".

Domain 1 — 27%Core Pattern

How the Agent Loop Works

The Agent Loop — 5-Step Cycle

⚠️

Exam Trap: stop_reason == "end_turn" is the only reliable exit signal. Never exit based on text content — Claude can return text AND a tool_use block in the same response. If you check text and exit early, you silently drop tool calls.

Step-by-Step: Building the Loop

1

Initialize messages list with user input

Start with messages = [{"role":"user","content": user_input}]. The system prompt goes in a separate system= parameter, NOT inside messages.

2

Call Claude API — check stop_reason immediately

Call client.messages.create(model, max_tokens, system, messages, tools). Read response.stop_reason FIRST before touching content. Branch on "end_turn" vs "tool_use".

3

If tool_use: run hooks → execute tool → collect result

Run pre_tool_use_hook() first — it may redirect or intercept. Then execute the tool. Run post_tool_use_hook() to normalize/trim the result. Collect all tool results in a list.

4

Append assistant message + tool results to messages

Append the assistant's FULL content (including all tool_use blocks) as {"role":"assistant","content": content}, then append tool results as {"role":"user","content": tool_results}.

5

If end_turn: extract final text and return

Find the text block in content, return it. Include a max_iterations guard to prevent infinite loops. Each iteration, the full messages list is re-sent — token cost grows linearly.

Hook Pipeline: Deterministic vs Probabilistic

❌ System Prompt Rules (Probabilistic)

Writing "Never approve refunds over $500" in the system prompt is probabilistic. In long conversations or adversarial inputs, Claude may not follow it. Token dilution weakens prompt-based rules over time.

✅ Hooks (Deterministic)

A pre_tool_use_hook() that checks if amount > 500: intercept() ALWAYS fires. It's your code — not Claude's interpretation. Use hooks for business rules with real-world consequences.

Hook Pipeline Flow

Core Code Pattern

agent.py — run_agent_loop()Python
def run_agent_loop(client, user_msg, tools, system, max_iter=10):
    messages = [{"role": "user", "content": user_msg}]

    for _ in range(max_iter):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            system=system,      # ← system prompt is SEPARATE
            messages=messages,
            tools=tools,
        )

        # ① stop_reason is THE exit signal — check it FIRST
        if response.stop_reason == "end_turn":
            return extract_text(response.content)

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []

            for block in response.content:
                if block.type != "tool_use": continue

                # ② Pre-hook: deterministic business rule enforcement
                intercept = pre_tool_use_hook(block.name, block.input)
                if intercept:
                    result = handle_intercept(intercept)
                else:
                    result = execute_tool(block.name, block.input)

                # ③ Post-hook: normalize + trim result
                result = post_tool_use_hook(block.name, result)
                tool_results.append({"type":"tool_result","tool_use_id":block.id,"content":str(result)})

            messages.append({"role": "user", "content": tool_results})

Subagents & Explicit Context

🔑 Rule: Subagents have NO shared memory

A subagent starts with a blank context. The coordinator must inject ALL context explicitly into the Task prompt — topic, prior findings, output format, constraints. Never assume a subagent knows what the coordinator knows.

⚡ Parallel spawning = multiple Task calls in one response

If the coordinator returns three {"type":"tool_use","name":"Task"} blocks in one response, all three run concurrently. This is how you get true parallel execution — one response, multiple simultaneous subagents.

💡

Hub-and-spoke pattern: Coordinator knows everything; subagents know only what they're told. Results always flow back to the coordinator, which accumulates the full knowledge state. This prevents context fragmentation.

Quick Check

Q1. When is it safe to exit the agent loop?

A. When Claude's text response contains "Task complete"

B. After every tool call has returned a result

C. Only when stop_reason == "end_turn"

D. When the messages list exceeds 10 entries

✓ Correct! stop_reason is the only reliable exit signal. Text content can appear alongside a tool_use block in the same response.

✗ Incorrect. Only stop_reason == "end_turn" is a safe exit. Claude can emit text AND a tool call in the same response.

Q2. What is the key advantage of pre_tool_use_hook over system prompt instructions?

A. Hooks are deterministic code — they always fire regardless of context length

B. Hooks can modify the model's system prompt at runtime

C. Hooks eliminate the need for tools altogether

D. Hooks reduce the token count of each API call

✓ Correct! System prompt instructions are probabilistic and can be diluted. Hooks are your code — they always execute.

✗ Incorrect. The key advantage is determinism — hooks always execute because they're code, not Claude's interpretation.

⚙️

Claude Code Configuration

Claude Code uses a layered configuration system to apply the right instructions to the right files at the right time — saving context window tokens and ensuring team-wide consistency.

Domain 3 — 20%CLAUDE.md Hierarchy

The 3-Level CLAUDE.md Hierarchy

Configuration Inheritance — Outer to Inner

Level	Location	Who sees it	Commit to VCS?
1 — User	`~/.claude/CLAUDE.md`	Only you	❌ Never
2 — Project	`./CLAUDE.md`	Entire team	✅ Always
3 — Directory	`./src/api/CLAUDE.md`	That directory only	✅ Yes

Path-Specific Rules — Context Window Savings

How `.claude/rules/` works

Rules in .claude/rules/ use YAML frontmatter to declare which files they apply to. Claude Code only loads a rule if the currently-edited file matches the glob pattern.

.claude/rules/api-conventions.mdYAML Frontmatter
---
paths: ["src/api/**/*"]   # Loads ONLY when editing src/api/ files
---

# API Conventions
- All handlers must be async
- Return wrapper: `{"success": bool, "data": ..., "error": ...}`
- Use Pydantic for input validation
- Raise specific exceptions (not bare Exception)

💡

Context savings: With 10 rule files averaging 800 tokens each, loading all rules always = 8,000 tokens per request. Path-specific rules can drop this to 1–2 rules loaded = ~6,400 tokens saved per edit.

❌ Wrong approach

Put all rules in a single CLAUDE.md — every rule loads for every file type, burning context on irrelevant rules.

✅ Right approach

Split rules by file type in .claude/rules/ with path frontmatter — rules load only when relevant.

Skills — On-Demand Tasks with Isolation

Feature	CLAUDE.md	Skill
When active	Always loaded	Only when invoked (`/skill-name`)
Use for	General standards, always-apply rules	Specific tasks: code review, test gen
Output isolation	No	Yes — `context: fork` keeps output separate
Tool restriction	N/A	`allowed-tools` enforces least privilege

.claude/skills/code-review/SKILL.mdSkill definition
context: fork        # ← isolated session, won't pollute main context
allowed-tools: ["Read", "Grep", "Glob"]  # ← read-only, no Write

## Code Review Checklist
1. Check for N+1 query patterns in loops
2. Verify no PII is logged
3. Ensure all endpoints have auth dependency
4. Validate response wrapper pattern: {success, data, error}

MCP Configuration — Team vs Personal

✅ .mcp.json (commit to VCS)

Team-shared MCP servers. Use ${ENV_VAR} syntax — real tokens go in .env (gitignored) or CI secrets. Cloning the repo gives the whole team the server config.

🔒 ~/.claude.json (NEVER commit)

Personal MCP servers, local dev databases, experiments. Each developer maintains their own. Copy personal-claude-override.example.json as a template.

⚠️

Exam distinction: If a question asks "where do you put a Slack MCP that the entire team should use?" → .mcp.json with ${SLACK_TOKEN}. If it asks "where do you put your personal dev DB?" → ~/.claude.json.

Interactive: Configuration File Scope Explorer

Click a file to see its scope and what Claude Code does with it:

📁 my-project/

📄 CLAUDE.mdALL team members

📄 .mcp.jsonTeam MCP servers

📁 .claude/

📁 rules/

📄 api-conventions.mdsrc/api/**/*

📄 testing.md**/*.test.py

📁 skills/

📄 code-review/SKILL.mdOn demand: /code-review

📄 personal-override.example.jsonCopy to ~/.claude.json

Claude Code Hooks — 5 Handler Types & Key Events

What hooks actually are

Hooks are JSON-configured shell commands, HTTP endpoints, MCP tool calls, or LLM prompts that fire automatically at lifecycle events. They are not Python functions you call in your agent loop — they are Claude Code's own event system, configured in settings files.

⚠️

Exam trap: A blocking hook (exit code 2) stops a tool call even if an allow rule would permit it. Deny rules still evaluate regardless of hook output. Precedence: blocking hook → deny rule → ask rule → allow rule. Exit code 0 with no output = hook has no decision; normal permission flow applies.

Hook Type	key	What runs
`command`	Shell script	Receives JSON on stdin, communicates via exit codes and stdout
`http`	HTTP endpoint	JSON posted to URL; response body = decision
`mcp_tool`	MCP tool call	Calls a tool on an already-connected MCP server
`prompt`	LLM prompt	Sends to Claude model for yes/no evaluation
`agent`	Subagent	Spawns subagent with Read/Grep/Glob tools (experimental)

Event	When it fires	Can block?
`SessionStart`	Session begins or resumes	No
`UserPromptSubmit`	User submits a prompt, before Claude processes it	No
`PreToolUse`	Before a tool call executes	Yes
`PermissionRequest`	When a permission dialog appears	Yes
`PermissionDenied`	Tool call denied by auto mode; return {retry:true} to retry	No
`PostToolUse`	After a tool call succeeds	No
`PostToolBatch`	After ALL parallel tool calls resolve; before next model call	No
`Stop`	When Claude finishes responding (end_turn)	No
`SubagentStart` / `SubagentStop`	When a subagent is spawned / finishes	No
`InstructionsLoaded`	When a CLAUDE.md or rules file loads	No
`FileChanged`	When a watched file changes on disk	No
`PreCompact` / `PostCompact`	Before/after context compaction	No
`SessionEnd`	When session terminates	No

💡

Hook config scope: Define hooks in ~/.claude/settings.json (all your projects) or .claude/settings.json (single project, committable to VCS). Hooks from the project settings file can be shared with the team. Personal hooks or security policies go in ~/.claude/settings.json.

Permission System — Modes, Rules & Precedence

⚠️

Rule evaluation order: deny → ask → allow. The FIRST matching rule wins. A deny rule in user settings blocks even if project settings has an allow rule for the same tool. Deny rules take absolute precedence.

Mode (`defaultMode`)	Behavior
`default`	Prompts on first use of each tool
`acceptEdits`	Auto-accepts file edits and common filesystem commands
`plan`	Read-only: Claude can explore but cannot edit files
`auto`	Background safety checks; auto-approves aligned actions (research preview)
`dontAsk`	Auto-denies unless pre-approved via allow rules
`bypassPermissions`	Skips ALL prompts — only for isolated containers/VMs

Permission rule syntax

Bash — matches ALL bash commands; as deny, removes tool from context entirely
Bash(rm *) — scoped; leaves tool available, blocks only matching commands
mcp__memory__.* — all tools from the memory MCP server (regex)
mcp__memory__create_entities — one specific MCP tool
Agent(Explore) — controls which subagents Claude can spawn
WebFetch(domain:example.com) — domain-scoped web access

⚠️

Key distinction: A bare deny rule like Bash removes the tool from Claude's context entirely — Claude never sees it. A scoped deny like Bash(rm *) leaves the tool available and blocks only matching calls. This matters for questions about "removing a tool vs restricting a tool."

Quick Check

Q1. A new team member joins. Which CLAUDE.md level ensures they automatically get the team coding standards when they clone the repo?

A. Level 1 — User (~/.claude/CLAUDE.md)

B. Level 2 — Project (./CLAUDE.md committed to VCS)

C. Level 3 — Directory (./src/CLAUDE.md)

D. Any level works equally well

✓ Correct! Level 2 (project) is committed to VCS — any developer who clones the repo gets it automatically.

✗ Incorrect. Level 1 is personal (not committed). Level 3 is directory-scoped. Only Level 2 is committed and team-wide.

🔧

MCP Tool Design

Tool descriptions are selection mechanisms — Claude reads them to decide which tool to call. Ambiguous descriptions cause wrong choices. Structured errors enable intelligent retry decisions.

Domain 2 — 18%Tool Descriptions

Tool Description Quality

❌ Weak description

{
  "name": "get_customer",
  "description": "Get customer information",
  "input_schema": {
    "properties": {
      "identifier": {"type": "string"}
    }
  }
}

Problem: Claude must guess whether to pass an email or ID. With two similar tools, it may pick the wrong one.

✅ Strong description

{
  "name": "get_customer_by_email",
  "description": "Retrieve customer record using
  their email address. Use for INITIAL lookup
  when you only have the email. Do NOT use if
  you already have the customer_id — use
  get_customer_by_id instead.",
  "input_schema": {
    "properties": {
      "email": {"type": "string",
        "description": "Customer's email address"}
    },
    "required": ["email"]
  }
}

ℹ️

3 rules for tool descriptions:
1. State the primary use case — when should Claude choose this tool?
2. State exclusion conditions — when should Claude NOT choose this tool?
3. Include disambiguation cues for tools with similar names or inputs.

Structured Error Taxonomy

Return structured errors, not strings. The agent loop reads isRetryable and errorCategory to make branching decisions — no natural language parsing required.

Category	Retryable?	Example	Agent Action
TRANSIENT	Yes	Network timeout, 503, rate limit	Retry with exponential backoff
VALIDATION	No	Wrong field type, missing required field	Fix input, re-submit
BUSINESS	No	Refund exceeds limit, unauthorized action	Escalate to human
PERMISSION	No	Access denied, auth required	Escalate or surface to user

errors.py — Structured error returnPython
from dataclasses import dataclass
from enum import Enum

class ErrorCategory(Enum):
    TRANSIENT = "transient"    # retry OK
    VALIDATION = "validation"  # fix input
    BUSINESS = "business"      # escalate
    PERMISSION = "permission"  # escalate

@dataclass
class StructuredToolError:
    errorCategory: ErrorCategory
    isRetryable: bool
    message: str
    attempted_operation: str
    partial_results: dict = None

# Usage in tool implementation:
def process_refund(amount, ...):
    if amount > 500:
        return StructuredToolError(
            errorCategory=ErrorCategory.BUSINESS,
            isRetryable=False,   # ← agent won't retry
            message="Refund exceeds automated limit",
            attempted_operation="process_refund",
        )

Quick Check

Q1. A tool returns a rate limit error (HTTP 429). What errorCategory and isRetryable should it use?

A. TRANSIENT, isRetryable=True

B. BUSINESS, isRetryable=False

C. VALIDATION, isRetryable=True

D. PERMISSION, isRetryable=False

✓ Correct! Rate limits are transient — the service is temporarily unavailable, not broken. Retry with backoff.

✗ Incorrect. Rate limits are transient (temporary). TRANSIENT + isRetryable=True is correct.

📊

Data Extraction Pipeline

Use tool_choice: {type:"tool"} to force structured output, Pydantic for semantic validation, a retry-with-feedback loop for arithmetic errors, and the Batches API for cost-efficient bulk processing.

Domain 4 — 20%Structured Output

Full Extraction Pipeline

Documentraw text

→

Few-shot Prompt4 examples + rules

→

Claude APItool_choice: forced

→

tool_use blockstructured JSON

→

Pydantic Validatearithmetic checks

→

Route by Confidence3 tiers

⚠️

Key exam point: tool_choice: {"type":"tool","name":"extract_invoice_data"} forces Claude to call exactly that tool. stop_reason will be "tool_use", not "end_turn". This guarantees valid JSON structure — but NOT correct values. Semantic validation (arithmetic) is your responsibility.

JSON Schema Design Rules

Required vs Nullable Fields

Use "type": ["string","null"] for genuinely optional fields. This allows null without being required — prevents Claude from fabricating values for missing data.

"other" Enum + Detail Field

Never make an enum without an "other" option. Pair it with a detail field: "currency_detail" captures the actual value when the currency isn't in your enum. Prevents data loss.

schemas.py — Extraction tool definition (key fields)Python
InvoiceExtractionTool = {
  "name": "extract_invoice_data",
  "input_schema": {
    "type": "object",
    "properties": {
      # Required: minimum viable data
      "vendor_name":   {"type": "string"},
      "invoice_number": {"type": "string"},
      "total_amount":   {"type": "number"},

      # Nullable: genuinely optional — null beats fabrication
      "payment_terms": {"type": ["string", "null"]},
      "po_number":     {"type": ["string", "null"]},

      # "other" enum + detail to prevent data loss
      "currency": {"type": "string",
        "enum": ["USD","EUR","GBP","other"]},
      "currency_detail": {"type": ["string","null"],
        "description": "Fill if currency='other'"},

      # Confidence enables downstream routing
      "confidence_score": {"type": "number",
        "minimum": 0.0, "maximum": 1.0},
    },
    "required": ["vendor_name","invoice_number","total_amount","confidence_score"]
  }
}

Validation Layers

1

Layer 1 — JSON Schema (API level, automatic)

The API validates structure before returning: correct field names, types, required fields. This is free — you get it from tool_use. It catches syntactic errors.

2

Layer 2 — Pydantic (semantic, your responsibility)

Run Pydantic validators on the extracted data. Check arithmetic: sum(line_items) ≈ total_amount. Check date formats. These are semantic errors — valid JSON structure, wrong values.

3

Retry with specific feedback (max 2 attempts)

If validation fails, build a feedback prompt: "sum(line_items)=$145 but total_amount=$200 — difference $55 is missing. Re-read for fees/taxes." Pass this with the original document. Claude finds the missed line item.

🚫

When retry WON'T help: If the invoice has NO invoice number, retrying just produces null again (or a hallucination). Retry is effective only when "the information IS in the document but Claude missed it." Route to human review when data is genuinely absent.

Confidence-Based Routing (3 Tiers)

AUTO_PROCESS

confidence ≥ 0.85 AND valid AND no warnings
Still sample 5% randomly for audit — catches systematic confidence miscalibration

HUMAN_REVIEW

confidence 0.60–0.84 OR has warnings OR amount > threshold
Queue for human review — don't block the pipeline

REJECT

confidence < 0.60 OR has errors (blocking validation failures)
Do not process — flag for investigation

Message Batches API

✅ Use batch API when

Nightly invoice processing, bulk historical digitization, training data generation, scheduled reporting. Volume is high, timing is flexible.

❌ Do NOT use batch API when

User uploads invoice and waits for result, real-time webhook-triggered processing, anything with SLA < 24 hours.

⚠️

Batch API key facts for exam: ~50% cost reduction. Up to 24 hours to complete. custom_id enables: (1) correlation, (2) selective retry of only failed docs, (3) idempotency. Best practice: custom_id = "invoice-vendorA-2024-01-15-00042" — enough info to identify the document without lookup.

✍️

Prompt Engineering

Effective prompts show, not just tell. Few-shot examples cover edge cases that instructions miss. Multi-pass architectures break large tasks into verifiable chunks.

D4 · Prompt Engineering — 20%Few-Shot

Few-Shot Examples — Show, Don't Just Tell

❌ Instructions only

"Extract invoice data. Normalize dates to ISO 8601. Convert written numbers to numeric values."

Result: "five hundred dollars" → "five hundred dollars" (not normalized)

✅ Instructions + examples

"Normalize dates to ISO 8601."

Example input: "March fifth, 2024"
Example output: {"invoice_date": "2024-03-05"}

Example input: "five hundred dollars"
Example output: {"total_amount": 500.00}

Result: Correct normalization

💡

Design your few-shot examples to cover:
• Happy path (standard format)
• Informal language ("about five hundred" vs "$500")
• Unusual format (bibliographic invoice, academic license)
• Missing fields (what to return when data isn't present — null, not fabricated)

Multi-Pass Architecture

1

Pass 1 — Broad Review

Scan the entire document or codebase. Identify all issues, sections, or items that need attention. Output a structured list with priorities. Don't fix yet — just enumerate.

2

Pass 2 — Deep Dive on Flagged Items

For each item flagged in Pass 1, perform detailed analysis. Include explicit review criteria: "Check for N+1 queries, SQL injection risks, missing auth, PII logging." Pass 1 output as context.

3

Pass 3 — Fix and Verify

Generate fixes based on Pass 2 analysis. Then verify the fix doesn't introduce new issues. Separate generation from verification — different prompts with different criteria.

⚠️

Why multi-pass for large code reviews? A single pass on a 5,000-line file forces Claude to simultaneously identify issues AND prioritize AND explain — splitting attention across all three degrades quality on each. Dedicated passes allow focused, deep analysis.

Explicit Review Criteria

Code review prompt with explicit criteriaPrompt Template
Review the following Python code for these specific issues:

SECURITY:
- SQL injection (string formatting in queries)
- PII logged to stdout/files
- Hardcoded credentials

PERFORMANCE:
- N+1 query patterns (queries inside loops)
- Missing database indexes on foreign keys
- Synchronous I/O in async handlers

CORRECTNESS:
- Missing auth dependency on endpoints
- Bare except clauses that swallow errors
- Missing response wrapper {success, data, error}

FORMAT:
For each issue found, output:
  SEVERITY: [HIGH/MED/LOW]
  FILE: path/to/file.py:line
  ISSUE: one-line description
  FIX: specific code change needed

Quick Check

Q1. Why are few-shot examples more effective than instructions alone for normalization tasks?

A. Examples reduce token count in the prompt

B. Examples bypass the model's training data

C. Examples demonstrate the exact transformation, while instructions may be ambiguous for edge cases

D. Examples are cached by the API for faster responses

✓ Correct! "Convert written numbers to numeric" doesn't tell Claude what to do with "approximately five hundred". An example showing that input → 500.00 makes the expectation unambiguous.

✗ Incorrect. Examples demonstrate the exact expected transformation, removing ambiguity that instructions leave open.

📦

Context Management

The context window is finite. Poor placement, verbose outputs, and large tool results degrade performance and increase cost. Strategic patterns mitigate these limits.

D5 · Context & Reliability — 15%Context Window

Lost-in-the-Middle Problem

Where Claude Pays Attention in a Long Context

Mitigation Pattern: KEY FINDINGS at top, ACTION ITEMS at bottom

When generating reports from large tool outputs, structure the response to place the most important content in the high-attention zones. Middle content (detailed evidence, full data) is less likely to influence the final summary.

Synthesis prompt structure for large reportsPrompt Pattern
Structure your response as follows:

## KEY FINDINGS (top — high attention)
- Finding 1: [most important insight]
- Finding 2: ...

## DETAILED EVIDENCE (middle)
[Comprehensive source citations and data]

## ACTION ITEMS (bottom — high attention)
- Action 1: [concrete next step]
- Action 2: ...

3 Context Management Patterns

📝 Scratchpad Files

Write intermediate results to disk instead of keeping them in the messages list. Read back only what's needed. Keeps the conversation context compact across many iterations.

🤖 Subagent Delegation

Delegate discovery tasks to subagents. The subagent processes verbose output and returns a compact summary. The coordinator never sees the raw verbosity — only the extracted facts.

🔧 Compact Tool Results

Post-tool hooks that trim large results: keep only the fields needed, normalize dates to ISO, cap list results at N items. Each trimmed result is sent on every subsequent API call.

State Persistence for Crash Recovery

state_persistence.py — Manifest patternPython
# The manifest pattern: save state after each completed subagent
# On restart: load manifest, skip completed agents, resume from last checkpoint

class StateManager:
    def save_manifest(self, results):
        manifest = {
            agent_id: {
                "status": r.status.value,
                "completed_at": datetime.now().isoformat()
            }
            for agent_id, r in results.items()
        }
        write_json("manifest.json", manifest)

    def get_pending(self, all_agents):
        manifest = load_json("manifest.json")
        # Only return agents NOT already completed
        return [a for a in all_agents
                if manifest.get(a.id, {}).get("status") != "completed"]

Quick Check

Q1. Why does placing critical information in the middle of a long context reduce reliability?

A. Middle content is automatically compressed by the API

B. Attention mechanisms give less weight to content far from both ends of the context

C. The API truncates content from the middle when context is long

D. Middle content is cached and not re-read on each token

✓ Correct! The lost-in-the-middle phenomenon — attention mechanisms naturally weight beginning and end more heavily. Place key findings at top, actions at bottom.

✗ Incorrect. This is the lost-in-the-middle effect — attention weights are lower for content far from both ends of the window.

🚨

Escalation & Human-in-the-Loop

Knowing when NOT to act is as important as knowing how to act. Escalation is a feature, not a failure — it preserves trust, ensures policy compliance, and creates the feedback loop that improves the system.

D5 · Context & Reliability — 15%HITL

When to Escalate — Decision Framework

Escalation Decision Tree

⚠️

4 escalation triggers to memorize:
1. Policy gap — no rule exists for this situation
2. Explicit user request — user asked for a human; always honor
3. Unable to make progress — N retries exhausted, still failing
4. High-value / irreversible action — configured threshold exceeded

Human-in-the-Loop Routing Workflow

1

Agent processes request, builds confidence assessment

For each action or extraction, compute a confidence score. Store the reasons for uncertainty: missing data, conflicting sources, ambiguous instructions.

2

Route based on confidence × risk threshold

High confidence + low risk → auto-process. Medium confidence or high risk → human review queue. Low confidence or any error → reject/escalate. Thresholds are configurable per use case.

3

Human review provides correction signal

Human decisions flow back as training signal. Track: which types of documents consistently require review, which agent decisions humans consistently override. Use this to improve thresholds.

4

Audit auto-processed items (5% random sample)

Even auto-processed items need periodic human sampling. This catches systematic miscalibration — e.g., if Claude consistently reports 0.92 confidence on documents that are actually 60% accurate.

Escalate vs Retry — Decision Matrix

Situation	Action	Reason
Tool returns TRANSIENT error	Retry	Temporary — network/rate issue will resolve
Tool returns BUSINESS error	Escalate	Policy decision needed — not a technical fix
User says "get me a manager"	Escalate immediately	Explicit user request — always honor
3 retries, still failing	Escalate	Unable to make progress — human must intervene
Validation error in extraction	Retry with feedback	Claude missed data — retry may recover it
Missing data (genuinely absent)	Human review	Retry won't help — data isn't in the document

Quick Check

Q1. A user says "I'd like to speak with a human agent about my account." What should the agent do?

A. Attempt to resolve the issue first, then escalate if it can't be solved

B. Escalate immediately — explicit user request for a human must always be honored

C. Ask the user to clarify why they want a human before escalating

D. Escalate only if the agent cannot resolve the issue on the next attempt

✓ Correct! Explicit user request for human handoff is a non-negotiable escalation trigger. Never delay or negotiate — escalate immediately.

✗ Incorrect. Explicit user request is an immediate escalation trigger. Attempting to resolve first or asking for clarification violates the user's expressed preference.

🎯

Practice Exam

30 scenario-based questions across all 5 CCA-F domains. Select an answer — if correct you'll see the explanation. If wrong, the attempt is counted but you can try again until you get it right.

All 5 Domains140 QuestionsRetry on Wrong

✅ Correct: 0 ❌ Wrong attempts: 0 📊 Questions done: 0 / 140

Agent SDK & The Agent Loop

Initialize messages list with user input

Call Claude API — check stop_reason immediately

If tool_use: run hooks → execute tool → collect result

Append assistant message + tool results to messages

If end_turn: extract final text and return

❌ System Prompt Rules (Probabilistic)

✅ Hooks (Deterministic)

🔑 Rule: Subagents have NO shared memory

⚡ Parallel spawning = multiple Task calls in one response

Q1. When is it safe to exit the agent loop?

Q2. What is the key advantage of pre_tool_use_hook over system prompt instructions?

Claude Code Configuration

How .claude/rules/ works

✅ .mcp.json (commit to VCS)

🔒 ~/.claude.json (NEVER commit)

What hooks actually are

Permission rule syntax

Q1. A new team member joins. Which CLAUDE.md level ensures they automatically get the team coding standards when they clone the repo?

MCP Tool Design

Q1. A tool returns a rate limit error (HTTP 429). What errorCategory and isRetryable should it use?

Data Extraction Pipeline

Required vs Nullable Fields

"other" Enum + Detail Field

Layer 1 — JSON Schema (API level, automatic)

Layer 2 — Pydantic (semantic, your responsibility)

Retry with specific feedback (max 2 attempts)

✅ Use batch API when

❌ Do NOT use batch API when

Prompt Engineering

Pass 1 — Broad Review

Pass 2 — Deep Dive on Flagged Items

Pass 3 — Fix and Verify

Q1. Why are few-shot examples more effective than instructions alone for normalization tasks?

Context Management

Mitigation Pattern: KEY FINDINGS at top, ACTION ITEMS at bottom

📝 Scratchpad Files

🤖 Subagent Delegation

🔧 Compact Tool Results

Q1. Why does placing critical information in the middle of a long context reduce reliability?

Escalation & Human-in-the-Loop

Agent processes request, builds confidence assessment

Route based on confidence × risk threshold

Human review provides correction signal

Audit auto-processed items (5% random sample)

Q1. A user says "I'd like to speak with a human agent about my account." What should the agent do?

Practice Exam

All Questions Answered!

How `.claude/rules/` works