Capability-Based Guardrails Integration Guide¶

This guide explains how to integrate capability-based guardrails into the Vell AgentCore system.

Status: IMPLEMENTED¶

As of 2026-02-20, per-capability guardrail resolution and self-healing are wired into the runtime:

AgentOrchestrator::executeCapability() resolves guardrails per step via resolveGuardrailForCapability()
AgentOrchestrator::attemptGroundedRetry() implements self-healing on guardrail intervention
StreamService::getGuardrailConfig() accepts an optional $capabilitySlug for capability-aware resolution

Overview¶

Capability-based guardrails allow you to apply different AWS Bedrock Guardrails based on what the agent is doing (capability), not just agent-level settings. This provides granular control over content safety.

Use Cases¶

Marketplace Awareness: Use Enterprise tier (strictest grounding, prevent hallucinations about AWS Marketplace)
Content Generation: Use Prod tier (balanced safety and creativity)
Social Media Publishing: Use Dev tier (lighter filtering for creative content)
Deal Influence Tracking: Use Enterprise tier (block sensitive deal IDs and PII)

Architecture¶

Runtime Guardrail Resolution (5-Layer Model)¶

┌─────────────────────────────────────────────────────────┐
│ Layer 1: Capability Resolution                          │
│   CapabilityRegistry → identifies what's being executed │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Guardrail Resolution                           │
│   AgentOrchestrator::resolveGuardrailForCapability()    │
│   Priority: Agent > Capability > Platform > None        │
│   Resolves: guardrail_id + version + trace flag         │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Bedrock Invocation                             │
│   BedrockRuntimeService.invokeModel() with guardrail    │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Self-Healing                                   │
│   AgentOrchestrator::attemptGroundedRetry()             │
│   On INTERVENED: retry with grounding context           │
│   On 2nd failure: return structured error, don't pass   │
│   ungrounded content to user                            │
├─────────────────────────────────────────────────────────┤
│ Layer 5: Output Delivery                                │
│   StreamService (chat) or AgentExecution (workflow)     │
└─────────────────────────────────────────────────────────┘

Guardrail Resolution Flow¶

Agent Request
    ↓
AgentOrchestrator.execute()
    ↓
WorkflowPlanner.plan() → Generate workflow steps
    ↓
For Each Step:
    ├─ Get capability slug (e.g., "marketplace_awareness")
    ├─ resolveGuardrailForCapability(agent, capabilitySlug):
    │   ├─ Priority 1: Agent guardrail_enabled? → Use agent's guardrail
    │   ├─ Priority 2: CapabilityGuardrailMapping → Use capability guardrail
    │   ├─ Priority 3: Platform default → Use settings guardrail
    │   └─ Priority 4: None
    ├─ Pass _guardrail config to handler via parameters
    ├─ Execute capability
    └─ On guardrail intervention → attemptGroundedRetry()
        ├─ Retry with grounding instructions
        ├─ On success → return grounded result
        └─ On 2nd failure → return structured error (never pass ungrounded output)
    ↓
BedrockRuntimeService.invokeModel($model, $prompt, $options)

Grounded vs Non-Grounded Capabilities¶

22 of 34 capabilities require grounding (outputs must be verifiable against source data). 12 are either creative generation (grounding would be too restrictive) or utility operations.

Category	Grounded?	Guardrail Tier	Self-Healing
marketplace_awareness, deal_influence_tracking, partner_intelligence	Yes	Enterprise	Full retry
generate_text, seo_content_optimize, analyze_content	Yes	Prod	Full retry
generate_image, publish_to_social_media	No	Dev/None	No retry
sync_marketplace_listings, fetch_external_url	No	None	N/A (pure data)

StreamService (Chat Path)¶

StreamService::getGuardrailConfig() now accepts an optional capability slug: - When called with a capability context, resolves per-capability guardrails first - Falls back to platform default if no capability mapping exists - Chat without capability context still uses platform default behavior

Bedrock AgentCore (Memory) - No Impact¶

BedrockAgentService handles session memory via Bedrock's agent memory API. It is orthogonal to guardrails. Guardrail enforcement happens at the invocation layer (BedrockRuntimeService), not the memory layer. No changes needed.

Implementation Details¶

Key Files¶

AgentOrchestrator::resolveGuardrailForCapability() - Priority chain resolution
AgentOrchestrator::attemptGroundedRetry() - Self-healing on intervention
StreamService::getGuardrailConfig() - Capability-aware chat guardrails
CapabilityGuardrailMapping::getGuardrailForCapability() - DB lookup with caching
BedrockRuntimeService::invokeModel() - Guardrail passthrough to Bedrock API
CapabilityGuardrailController::AUDIT_RECOMMENDATIONS - Risk/tier/grounding audit data

Self-Healing Pattern¶

When Bedrock's contextual grounding filter blocks output:

Detection: BedrockRuntimeService returns guardrails_triggered > 0
Retry: attemptGroundedRetry() adds explicit grounding instructions to the prompt
Success path: Retried output passes grounding → delivered to user
Failure path: Retry also blocked → structured error returned, no ungrounded content reaches user

This prevents the "fitness output" scenario where the model drifts off-topic and ungrounded content reaches the user without being caught.

Integration Steps (Reference)¶

1. AgentOrchestrator (DONE)¶

The executeCapability() method in AgentOrchestrator.php now:

use App\Models\CapabilityGuardrailMapping;

// Inside the method that executes a capability step
protected function executeCapabilityStep(Agent $agent, array $step): array
{
    $capabilitySlug = $step['capability'];

    // Determine which guardrail to use (priority order):
    // 1. Agent-level guardrail (if enabled)
    // 2. Capability-level guardrail (from mapping)
    // 3. Platform default (from settings)
    // 4. None

    $guardrailOptions = $this->resolveGuardrailForCapability($agent, $capabilitySlug);

    // Pass to Bedrock
    $response = $this->bedrockRuntimeService->invokeModel(
        $agent->ai_model,
        $prompt,
        array_merge($options, $guardrailOptions)
    );

    return $response;
}

protected function resolveGuardrailForCapability(Agent $agent, string $capabilitySlug): array
{
    // Priority 1: Agent-level guardrail
    if ($agent->guardrail_enabled && $agent->guardrail_id) {
        return [
            'guardrail_id' => $agent->guardrail_id,
            'guardrail_version' => $agent->guardrail_version ?? 'DRAFT',
            'guardrail_trace' => $agent->guardrail_trace_enabled ?? false,
        ];
    }

    // Priority 2: Capability-level guardrail
    $capabilityGuardrail = CapabilityGuardrailMapping::getGuardrailForCapability($capabilitySlug);
    if ($capabilityGuardrail) {
        return [
            'guardrail_id' => $capabilityGuardrail['guardrail_id'],
            'guardrail_version' => $capabilityGuardrail['guardrail_version'],
            'guardrail_trace' => false, // Capability-level doesn't have trace setting
        ];
    }

    // Priority 3: Platform default
    $defaultGuardrailId = setting('bedrock_default_guardrail_id');
    if ($defaultGuardrailId) {
        return [
            'guardrail_id' => $defaultGuardrailId,
            'guardrail_version' => setting('bedrock_default_guardrail_version', 'DRAFT'),
            'guardrail_trace' => false,
        ];
    }

    // Priority 4: None
    return [];
}

2. BedrockRuntimeService (ALREADY DONE)¶

BedrockRuntimeService::invokeModel() already supports guardrail passthrough. When $options['guardrail_id'] is set, it adds guardrailIdentifier and guardrailVersion to the Bedrock API call. It also detects amazonBedrockGuardrailAction === INTERVENED and returns guardrails_triggered in the response, which triggers the self-healing layer.

3. Test the Integration¶

Deploy CloudFormation stack (if not already deployed):

cd infrastructure/cloudformation
./deploy-guardrails.sh

Sync guardrails to database:
Go to Admin → Bedrock Guardrails
Click "Sync from AWS"
Verify 3 guardrails appear (dev, prod, enterprise)
Create capability mappings:
Go to Admin → Capability Guardrails
Assign guardrails to capabilities:
- marketplace_awareness → vellocity-marketplace-trust-enterprise
- generate_text → vellocity-marketplace-trust-prod
- publish_to_social_media → vellocity-marketplace-trust-dev
Create test agent:
Go to Content Manager → Agents → Create Agent
Name: "Test Capability Guardrails Agent"
Select capabilities: generate_text, marketplace_awareness
Do NOT enable agent-level guardrails (we want to test capability-level)
Save

Execute test task:

Task: "Generate a blog post about AWS Marketplace pricing strategies,
then analyze the latest AWS Marketplace trends"

Verify guardrails are applied:
Check Laravel logs (storage/logs/laravel.log)
Look for guardrail invocation logs
Verify different guardrails were used for different capabilities:
- generate_text step should use prod guardrail
- marketplace_awareness step should use enterprise guardrail
Test guardrail blocking:
Try input with sensitive data (email, phone, AWS account ID)
Enterprise tier should block, prod/dev should anonymize
Try ungrounded claims ("AWS guarantees unlimited support")
All tiers should block based on word filters

Configuration Management¶

Platform Default Guardrail¶

Set in Admin → Settings → Tools → Bedrock Guardrails: - Default Guardrail ID: Select from deployed guardrails - Default Version: DRAFT or numbered version - Enable by Default: Auto-enable for new agents - Require Guardrails: Force all agents to use guardrails

Agent-Level Guardrails¶

Set in Agent → Create/Edit → Advanced Settings → Bedrock Guardrails: - Enable Guardrails: Checkbox - Select Guardrail: Dropdown (loads from /api/guardrails/available) - Version: DRAFT, 1, 2, 3 - Enable Trace Logging: For debugging

Capability-Level Guardrails¶

Set in Admin → Capability Guardrails: - View all agent capabilities - Assign guardrail per capability - Set priority (for multiple mappings) - Add description (why this guardrail?)

API Endpoints¶

List Available Guardrails¶

GET /api/guardrails/available

Response:

{
  "success": true,
  "guardrails": [
    {
      "guardrail_id": "abc123xyz",
      "name": "vellocity-marketplace-trust-enterprise",
      "scope": "platform",
      "version": "1",
      "status": "active"
    }
  ]
}

Get Guardrail for Capability¶

use App\Models\CapabilityGuardrailMapping;

$guardrail = CapabilityGuardrailMapping::getGuardrailForCapability('generate_text');
// Returns: ['guardrail_id' => '...', 'guardrail_version' => '...'] or null

Troubleshooting¶

Guardrails Not Applied¶

Check guardrail sync:
Admin → Guardrails → Sync from AWS
Verify status is active
Check capability mapping:
Admin → Capability Guardrails
Verify mapping exists and is enabled
Check agent settings:
If agent has guardrail_enabled = true, it overrides capability mappings
Disable agent-level guardrails to use capability-level
Check Bedrock credentials:
.env file has correct AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Settings → Tools → Bedrock Access Model is configured

Guardrails Blocking Too Much¶

Use lower tier:
Enterprise → Prod → Dev (descending strictness)
Adjust grounding thresholds:
Edit CloudFormation template
Lower Threshold values (0.0 = most permissive, 1.0 = most strict)
Redeploy stack
Sync in admin panel
Remove blocked words:
Edit WordsConfig in CloudFormation
Redeploy and sync

Performance Issues¶

Use versioned guardrails (not DRAFT):
DRAFT requires AWS to fetch latest config on every call
Version 1, 2, 3 are cached by AWS
Check guardrail usage in admin:
Admin → Guardrails shows usage_count and last_used_at
High usage may indicate over-application
Consider capability-level exemptions:
Remove mappings for low-risk capabilities
E.g., analyze_content may not need strict guardrails

Cost Optimization¶

AWS Bedrock Guardrails pricing (as of 2025): - $0.75 per 1,000 content units (text) - $1.00 per 1,000 content units (images)

Tips: 1. Use capability mappings sparingly: Only for high-risk capabilities 2. Use lower tiers where appropriate: Dev tier costs same but allows more content through 3. Monitor usage: Admin → Guardrails shows usage counts 4. Consider agent-level override: For trusted agents, disable guardrails entirely

Best Practices¶

Start with platform default: Set a baseline guardrail in settings
Assign capability-level for exceptions: Only map capabilities that need different rules
Use agent-level for VIPs: For specific agents (e.g., internal tools), override with looser guardrails
Enable trace mode during development: Debug guardrail behavior
Version guardrails in production: Use numbered versions (not DRAFT)
Document mappings: Use the description field to explain why each mapping exists
Review guardrail logs regularly: Check storage/logs/laravel.log for interventions

/docs/BEDROCK_GUARDRAILS.md - Main guardrails documentation
/infrastructure/cloudformation/README.md - CloudFormation deployment guide
AWS Bedrock Guardrails: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html

Support¶

For issues: - Deployment problems: Check CloudFormation stack events in AWS Console - Integration bugs: Review Laravel logs at storage/logs/laravel.log - Guardrail behavior: Enable trace mode and check logs