Agent SDK & The Agent Loop
The agent loop is the heartbeat of every Claude-powered agent. It orchestrates the conversation between Claude, your tools, and the user — iterating until Claude signals it is done via stop_reason = "end_turn".
stop_reason == "end_turn" is the only reliable exit signal. Never exit based on text content — Claude can return text AND a tool_use block in the same response. If you check text and exit early, you silently drop tool calls.Initialize messages list with user input
Start with messages = [{"role":"user","content": user_input}]. The system prompt goes in a separate system= parameter, NOT inside messages.
Call Claude API — check stop_reason immediately
Call client.messages.create(model, max_tokens, system, messages, tools). Read response.stop_reason FIRST before touching content. Branch on "end_turn" vs "tool_use".
If tool_use: run hooks → execute tool → collect result
Run pre_tool_use_hook() first — it may redirect or intercept. Then execute the tool. Run post_tool_use_hook() to normalize/trim the result. Collect all tool results in a list.
Append assistant message + tool results to messages
Append the assistant's FULL content (including all tool_use blocks) as {"role":"assistant","content": content}, then append tool results as {"role":"user","content": tool_results}.
If end_turn: extract final text and return
Find the text block in content, return it. Include a max_iterations guard to prevent infinite loops. Each iteration, the full messages list is re-sent — token cost grows linearly.
❌ System Prompt Rules (Probabilistic)
Writing "Never approve refunds over $500" in the system prompt is probabilistic. In long conversations or adversarial inputs, Claude may not follow it. Token dilution weakens prompt-based rules over time.
✅ Hooks (Deterministic)
A pre_tool_use_hook() that checks if amount > 500: intercept() ALWAYS fires. It's your code — not Claude's interpretation. Use hooks for business rules with real-world consequences.
🔑 Rule: Subagents have NO shared memory
A subagent starts with a blank context. The coordinator must inject ALL context explicitly into the Task prompt — topic, prior findings, output format, constraints. Never assume a subagent knows what the coordinator knows.
⚡ Parallel spawning = multiple Task calls in one response
If the coordinator returns three {"type":"tool_use","name":"Task"} blocks in one response, all three run concurrently. This is how you get true parallel execution — one response, multiple simultaneous subagents.
Q1. When is it safe to exit the agent loop?
stop_reason == "end_turn"Q2. What is the key advantage of pre_tool_use_hook over system prompt instructions?
Claude Code Configuration
Claude Code uses a layered configuration system to apply the right instructions to the right files at the right time — saving context window tokens and ensuring team-wide consistency.
| Level | Location | Who sees it | Commit to VCS? |
|---|---|---|---|
| 1 — User | ~/.claude/CLAUDE.md | Only you | ❌ Never |
| 2 — Project | ./CLAUDE.md | Entire team | ✅ Always |
| 3 — Directory | ./src/api/CLAUDE.md | That directory only | ✅ Yes |
How .claude/rules/ works
Rules in .claude/rules/ use YAML frontmatter to declare which files they apply to. Claude Code only loads a rule if the currently-edited file matches the glob pattern.
Put all rules in a single CLAUDE.md — every rule loads for every file type, burning context on irrelevant rules.
Split rules by file type in .claude/rules/ with path frontmatter — rules load only when relevant.
| Feature | CLAUDE.md | Skill |
|---|---|---|
| When active | Always loaded | Only when invoked (/skill-name) |
| Use for | General standards, always-apply rules | Specific tasks: code review, test gen |
| Output isolation | No | Yes — context: fork keeps output separate |
| Tool restriction | N/A | allowed-tools enforces least privilege |
✅ .mcp.json (commit to VCS)
Team-shared MCP servers. Use ${ENV_VAR} syntax — real tokens go in .env (gitignored) or CI secrets. Cloning the repo gives the whole team the server config.
🔒 ~/.claude.json (NEVER commit)
Personal MCP servers, local dev databases, experiments. Each developer maintains their own. Copy personal-claude-override.example.json as a template.
.mcp.json with ${SLACK_TOKEN}. If it asks "where do you put your personal dev DB?" → ~/.claude.json.Click a file to see its scope and what Claude Code does with it:
What hooks actually are
Hooks are JSON-configured shell commands, HTTP endpoints, MCP tool calls, or LLM prompts that fire automatically at lifecycle events. They are not Python functions you call in your agent loop — they are Claude Code's own event system, configured in settings files.
| Hook Type | key | What runs |
|---|---|---|
command | Shell script | Receives JSON on stdin, communicates via exit codes and stdout |
http | HTTP endpoint | JSON posted to URL; response body = decision |
mcp_tool | MCP tool call | Calls a tool on an already-connected MCP server |
prompt | LLM prompt | Sends to Claude model for yes/no evaluation |
agent | Subagent | Spawns subagent with Read/Grep/Glob tools (experimental) |
| Event | When it fires | Can block? |
|---|---|---|
SessionStart | Session begins or resumes | No |
UserPromptSubmit | User submits a prompt, before Claude processes it | No |
PreToolUse | Before a tool call executes | Yes |
PermissionRequest | When a permission dialog appears | Yes |
PermissionDenied | Tool call denied by auto mode; return {retry:true} to retry | No |
PostToolUse | After a tool call succeeds | No |
PostToolBatch | After ALL parallel tool calls resolve; before next model call | No |
Stop | When Claude finishes responding (end_turn) | No |
SubagentStart / SubagentStop | When a subagent is spawned / finishes | No |
InstructionsLoaded | When a CLAUDE.md or rules file loads | No |
FileChanged | When a watched file changes on disk | No |
PreCompact / PostCompact | Before/after context compaction | No |
SessionEnd | When session terminates | No |
~/.claude/settings.json (all your projects) or .claude/settings.json (single project, committable to VCS). Hooks from the project settings file can be shared with the team. Personal hooks or security policies go in ~/.claude/settings.json.Mode (defaultMode) | Behavior |
|---|---|
default | Prompts on first use of each tool |
acceptEdits | Auto-accepts file edits and common filesystem commands |
plan | Read-only: Claude can explore but cannot edit files |
auto | Background safety checks; auto-approves aligned actions (research preview) |
dontAsk | Auto-denies unless pre-approved via allow rules |
bypassPermissions | Skips ALL prompts — only for isolated containers/VMs |
Permission rule syntax
Bash — matches ALL bash commands; as deny, removes tool from context entirely
Bash(rm *) — scoped; leaves tool available, blocks only matching commands
mcp__memory__.* — all tools from the memory MCP server (regex)
mcp__memory__create_entities — one specific MCP tool
Agent(Explore) — controls which subagents Claude can spawn
WebFetch(domain:example.com) — domain-scoped web access
Bash removes the tool from Claude's context entirely — Claude never sees it. A scoped deny like Bash(rm *) leaves the tool available and blocks only matching calls. This matters for questions about "removing a tool vs restricting a tool."Q1. A new team member joins. Which CLAUDE.md level ensures they automatically get the team coding standards when they clone the repo?
MCP Tool Design
Tool descriptions are selection mechanisms — Claude reads them to decide which tool to call. Ambiguous descriptions cause wrong choices. Structured errors enable intelligent retry decisions.
{
"name": "get_customer",
"description": "Get customer information",
"input_schema": {
"properties": {
"identifier": {"type": "string"}
}
}
}
Problem: Claude must guess whether to pass an email or ID. With two similar tools, it may pick the wrong one.
{
"name": "get_customer_by_email",
"description": "Retrieve customer record using
their email address. Use for INITIAL lookup
when you only have the email. Do NOT use if
you already have the customer_id — use
get_customer_by_id instead.",
"input_schema": {
"properties": {
"email": {"type": "string",
"description": "Customer's email address"}
},
"required": ["email"]
}
}
1. State the primary use case — when should Claude choose this tool?
2. State exclusion conditions — when should Claude NOT choose this tool?
3. Include disambiguation cues for tools with similar names or inputs.
Return structured errors, not strings. The agent loop reads isRetryable and errorCategory to make branching decisions — no natural language parsing required.
| Category | Retryable? | Example | Agent Action |
|---|---|---|---|
| TRANSIENT | Yes | Network timeout, 503, rate limit | Retry with exponential backoff |
| VALIDATION | No | Wrong field type, missing required field | Fix input, re-submit |
| BUSINESS | No | Refund exceeds limit, unauthorized action | Escalate to human |
| PERMISSION | No | Access denied, auth required | Escalate or surface to user |
Q1. A tool returns a rate limit error (HTTP 429). What errorCategory and isRetryable should it use?
Data Extraction Pipeline
Use tool_choice: {type:"tool"} to force structured output, Pydantic for semantic validation, a retry-with-feedback loop for arithmetic errors, and the Batches API for cost-efficient bulk processing.
tool_choice: {"type":"tool","name":"extract_invoice_data"} forces Claude to call exactly that tool. stop_reason will be "tool_use", not "end_turn". This guarantees valid JSON structure — but NOT correct values. Semantic validation (arithmetic) is your responsibility.Required vs Nullable Fields
Use "type": ["string","null"] for genuinely optional fields. This allows null without being required — prevents Claude from fabricating values for missing data.
"other" Enum + Detail Field
Never make an enum without an "other" option. Pair it with a detail field: "currency_detail" captures the actual value when the currency isn't in your enum. Prevents data loss.
Layer 1 — JSON Schema (API level, automatic)
The API validates structure before returning: correct field names, types, required fields. This is free — you get it from tool_use. It catches syntactic errors.
Layer 2 — Pydantic (semantic, your responsibility)
Run Pydantic validators on the extracted data. Check arithmetic: sum(line_items) ≈ total_amount. Check date formats. These are semantic errors — valid JSON structure, wrong values.
Retry with specific feedback (max 2 attempts)
If validation fails, build a feedback prompt: "sum(line_items)=$145 but total_amount=$200 — difference $55 is missing. Re-read for fees/taxes." Pass this with the original document. Claude finds the missed line item.
Still sample 5% randomly for audit — catches systematic confidence miscalibration
Queue for human review — don't block the pipeline
Do not process — flag for investigation
✅ Use batch API when
Nightly invoice processing, bulk historical digitization, training data generation, scheduled reporting. Volume is high, timing is flexible.
❌ Do NOT use batch API when
User uploads invoice and waits for result, real-time webhook-triggered processing, anything with SLA < 24 hours.
custom_id enables: (1) correlation, (2) selective retry of only failed docs, (3) idempotency. Best practice: custom_id = "invoice-vendorA-2024-01-15-00042" — enough info to identify the document without lookup.Prompt Engineering
Effective prompts show, not just tell. Few-shot examples cover edge cases that instructions miss. Multi-pass architectures break large tasks into verifiable chunks.
"Extract invoice data. Normalize dates to ISO 8601. Convert written numbers to numeric values."
Result: "five hundred dollars" → "five hundred dollars" (not normalized)
"Normalize dates to ISO 8601."
Example input: "March fifth, 2024"
Example output: {"invoice_date": "2024-03-05"}
Example input: "five hundred dollars"
Example output: {"total_amount": 500.00}
Result: Correct normalization
• Happy path (standard format)
• Informal language ("about five hundred" vs "$500")
• Unusual format (bibliographic invoice, academic license)
• Missing fields (what to return when data isn't present — null, not fabricated)
Pass 1 — Broad Review
Scan the entire document or codebase. Identify all issues, sections, or items that need attention. Output a structured list with priorities. Don't fix yet — just enumerate.
Pass 2 — Deep Dive on Flagged Items
For each item flagged in Pass 1, perform detailed analysis. Include explicit review criteria: "Check for N+1 queries, SQL injection risks, missing auth, PII logging." Pass 1 output as context.
Pass 3 — Fix and Verify
Generate fixes based on Pass 2 analysis. Then verify the fix doesn't introduce new issues. Separate generation from verification — different prompts with different criteria.
Q1. Why are few-shot examples more effective than instructions alone for normalization tasks?
Context Management
The context window is finite. Poor placement, verbose outputs, and large tool results degrade performance and increase cost. Strategic patterns mitigate these limits.
Mitigation Pattern: KEY FINDINGS at top, ACTION ITEMS at bottom
When generating reports from large tool outputs, structure the response to place the most important content in the high-attention zones. Middle content (detailed evidence, full data) is less likely to influence the final summary.
📝 Scratchpad Files
Write intermediate results to disk instead of keeping them in the messages list. Read back only what's needed. Keeps the conversation context compact across many iterations.
🤖 Subagent Delegation
Delegate discovery tasks to subagents. The subagent processes verbose output and returns a compact summary. The coordinator never sees the raw verbosity — only the extracted facts.
🔧 Compact Tool Results
Post-tool hooks that trim large results: keep only the fields needed, normalize dates to ISO, cap list results at N items. Each trimmed result is sent on every subsequent API call.
Q1. Why does placing critical information in the middle of a long context reduce reliability?
Escalation & Human-in-the-Loop
Knowing when NOT to act is as important as knowing how to act. Escalation is a feature, not a failure — it preserves trust, ensures policy compliance, and creates the feedback loop that improves the system.
1. Policy gap — no rule exists for this situation
2. Explicit user request — user asked for a human; always honor
3. Unable to make progress — N retries exhausted, still failing
4. High-value / irreversible action — configured threshold exceeded
Agent processes request, builds confidence assessment
For each action or extraction, compute a confidence score. Store the reasons for uncertainty: missing data, conflicting sources, ambiguous instructions.
Route based on confidence × risk threshold
High confidence + low risk → auto-process. Medium confidence or high risk → human review queue. Low confidence or any error → reject/escalate. Thresholds are configurable per use case.
Human review provides correction signal
Human decisions flow back as training signal. Track: which types of documents consistently require review, which agent decisions humans consistently override. Use this to improve thresholds.
Audit auto-processed items (5% random sample)
Even auto-processed items need periodic human sampling. This catches systematic miscalibration — e.g., if Claude consistently reports 0.92 confidence on documents that are actually 60% accurate.
| Situation | Action | Reason |
|---|---|---|
| Tool returns TRANSIENT error | Retry | Temporary — network/rate issue will resolve |
| Tool returns BUSINESS error | Escalate | Policy decision needed — not a technical fix |
| User says "get me a manager" | Escalate immediately | Explicit user request — always honor |
| 3 retries, still failing | Escalate | Unable to make progress — human must intervene |
| Validation error in extraction | Retry with feedback | Claude missed data — retry may recover it |
| Missing data (genuinely absent) | Human review | Retry won't help — data isn't in the document |
Q1. A user says "I'd like to speak with a human agent about my account." What should the agent do?
Practice Exam
30 scenario-based questions across all 5 CCA-F domains. Select an answer — if correct you'll see the explanation. If wrong, the attempt is counted but you can try again until you get it right.