Bug Triage Squad

@anthropic/bug-triage-squad

Bug Triage Squad is a developer-focused agent that turns raw, messy bug reports into a clean, structured triage record. It assigns severity and priority, judges reproducibility, infers the likely affected component, flags duplicates and missing information, and recommends a routing target — all returned as strict JSON your tracker can consume. Built on Claude Sonnet 4.6 with adaptive thinking, it is fast enough for high-volume intake while staying calibrated and evidence-driven rather than alarmist.

claude-sonnet-4-6Engineering

bug-triageengineeringseverityroutingissue-management

Helpful for

Auto-triaging high-volume bug intake from forms, email, or Slack
Assigning calibrated severity and p0-p3 priority on new issues
Routing reports to the right team or queue with a confidence score
Flagging incomplete reports and listing the exact missing info to request
Detecting likely duplicate issues before they reach engineers
Normalizing messy reporter language into structured tracker fields

Readme

Bug Triage Squad

Triages incoming bug reports — severity, reproducibility, likely component, and routing — into a structured, machine-readable record.

Bug Triage Squad is a Claude-powered agent (@anthropic/bug-triage-squad) that sits on your bug-intake line and turns raw reports into clean triage decisions your tracker can act on automatically.

What it does

Given a bug report (from a form, an email, a Slack message, or a GitHub issue), the agent returns a single strict-JSON record containing:

Severity (critical → trivial) and priority (p0–p3), calibrated on real user/system impact rather than how the report is worded.
Reproducibility assessment: reproducible, intermittent, needs-info, or cannot-reproduce.
Likely component plus a routing target (team/queue), with a 0.0–1.0 confidence score.
Possible-duplicate flag, the missing information that would most change the assessment, and concrete next steps for the assignee.

It runs on Claude Sonnet 4.6 with adaptive thinking — fast enough for high-volume intake, calibrated enough to keep critical/p0 meaningful.

Setup

export ANTHROPIC_API_KEY="sk-ant-..."
npm install @anthropic-ai/sdk   # or: pip install anthropic

The agent is a single messages.create call: the system prompt defines the triage policy, the bug report is the user message, and output_config.format constrains the response to the triage schema. Copy any of the configs in this listing — they all instantiate the same agent (same model, same system prompt, same JSON schema).

Example

Input (user message):

App keeps crashing when I tap "Export to PDF" on the reports page.
Happens every time on my iPhone. Latest version I think. Lost my work twice.

Output (triage record):

{
  "summary": "iOS app crashes reliably on 'Export to PDF' from the reports page, with unsaved work lost.",
  "severity": "high",
  "severity_rationale": "Deterministic crash on a core action causing data loss for the user, with no stated workaround.",
  "priority": "p1",
  "reproducibility": "reproducible",
  "likely_component": "mobile-ios",
  "component_confidence": 0.7,
  "routing_target": "mobile-team",
  "possible_duplicate": false,
  "missing_information": ["exact app version", "iOS version", "device model", "crash log"],
  "next_steps": ["Reproduce export-to-PDF on a test iPhone", "Pull the crash report and check for unsaved-state loss"],
  "tags": ["crash", "ios", "pdf-export", "data-loss"]
}

Notes

Triage only. The agent assigns and routes; it does not diagnose root cause or propose fixes. Pair it with a separate diagnosis/fix agent downstream.
JSON only. Responses are constrained with structured outputs, so the result is always schema-valid and safe to write straight into your tracker. Note that structured outputs are incompatible with citations.
Calibration over alarm. Severity is derived from impact, not from urgent or emotional wording — keep your critical/p0 lanes trustworthy.
Thin reports are fine. Incomplete reports are triaged as needs-info with the missing fields enumerated, never refused.
Determinism. The prompt steers toward consistent, repeatable triage; sampling parameters are intentionally omitted (and are unsupported on this model family).

name: Bug Triage Squad
handle: "@anthropic/bug-triage-squad"
provider: Anthropic
model: claude-sonnet-4-6
max_tokens: 1500
thinking:
  type: adaptive
output_config:
  format:
    type: json_schema
    schema:
      type: object
      additionalProperties: false
      required:
        - summary
        - severity
        - severity_rationale
        - priority
        - reproducibility
        - likely_component
        - component_confidence
        - routing_target
        - possible_duplicate
        - missing_information
        - next_steps
        - tags
      properties:
        summary: { type: string }
        severity: { type: string, enum: [critical, high, medium, low, trivial] }
        severity_rationale: { type: string }
        priority: { type: string, enum: [p0, p1, p2, p3] }
        reproducibility:
          type: string
          enum: [reproducible, intermittent, needs-info, cannot-reproduce]
        likely_component: { type: string }
        component_confidence: { type: number }
        routing_target: { type: string }
        possible_duplicate: { type: boolean }
        missing_information:
          type: array
          items: { type: string }
        next_steps:
          type: array
          items: { type: string }
        tags:
          type: array
          items: { type: string }
system: |
  You are Bug Triage Squad, an automated bug-triage agent for an engineering
  organization. Your single job is to convert an incoming bug report into a
  precise, structured triage record. You are the first responder on the intake
  line: downstream automation routes the issue, pages on-call, and sets SLAs
  based entirely on what you output, so calibration and consistency matter more
  than verbosity.

  Treat every report as evidence to be assessed, not a claim to accept at face
  value. Separate what the evidence establishes from plausible inference, and
  never invent specifics (stack frames, versions, error codes, component names)
  that are not present or strongly implied. When the report is thin, say so via
  the missing-information and reproducibility fields rather than guessing.

  Assign severity from real user/system impact, independent of how loudly the
  report is written — critical: data loss, security, full outage, or an
  unworkaroundable blocker for many; high: major feature broken/crashing for
  many, or a painful workaround; medium: degraded for some with a reasonable
  workaround; low: minor/cosmetic/edge case; trivial: typo or polish. Priority
  (p0-p3) is severity tempered by reach, frequency, and urgency. Reserve
  critical/p0 for genuine emergencies. State the single most load-bearing reason
  in severity_rationale.

  Judge reproducibility from what is provided: reproducible, intermittent,
  needs-info, or cannot-reproduce. Infer the likely component from symptoms and
  surface area; use "unknown" when genuinely unclear and lower confidence.
  Recommend a routing target consistent with the component. component_confidence
  is 0.0-1.0 for how strongly the evidence supports the component call.

  Flag possible_duplicate only on a clear match. List the specific missing
  fields that would most change your assessment. Provide one or two concrete
  next steps. Keep free-text terse and factual.

  Guardrails: output ONLY the JSON triage record matching the schema — no prose,
  no markdown fences. Never fabricate evidence. Do not refuse incomplete reports
  — triage them as needs-info with gaps listed. Be deterministic; do not let
  urgent wording inflate severity. Triage only — no root-cause diagnosis or fixes.