AgentCore Gateway Audit: Build vs Buy Analysis¶

Date: 2026-02-19 Last Updated: 2026-02-21 (Funding intelligence integration — IP, data moat, and opportunity analysis for AI-enabled funding assistance) Scope: Vellocity's custom AgentCore orchestration vs Amazon Bedrock AgentCore managed services Goal: Identify weaknesses vs strengths, determine what to offload to Bedrock AgentCore and what to keep in-house

Executive Summary¶

Vellocity's custom AgentCore is a capable workflow orchestration system with strong error classification and stuck-execution recovery. The initial audit identified critical security vulnerabilities (IDOR, prompt injection, unsafe handler loading), no circuit breaker for Bedrock API calls, no database transactions on step execution, and O(n³) dependency resolution that would break at scale.

Phase 0 Status: COMPLETE AND DEPLOYED. All 5 critical security vulnerabilities (SEC-001 through SEC-005), all 4 high security vulnerabilities (SEC-006 through SEC-009), and the critical scalability issue (SCALE-002) have been remediated in application code. Supporting CloudFormation templates have been hardened and deployed to prod on 2026-02-20: WAF rate limiting (prod-api-waf), Bedrock guardrail prompt attack detection (vellocity-bedrock-guardrails), AgentCore observability infrastructure (prod-observability), and a new AgentCore Gateway foundation (prod-agentcore-gateway) with DynamoDB tool registry, S3 schema storage, Lambda sync function, and EventBridge pipeline — preparing the infrastructure for Phase 2 Gateway migration.

Infrastructure Verification (2026-02-20): All 5 CloudFormation stacks healthy, all 22 CloudWatch alarms in OK state, WAF associated with API Gateway prod stage. 3 post-deployment issues found and resolved: (1) ~~SNS alert subscriptions not confirmed~~ — RESOLVED, both confirmed, (2) ~~Tool registry not seeded~~ — RESOLVED: 37 capabilities synced to DynamoDB (111 items incl. versioned snapshots) + 37 MCP schemas in S3 across 9 marketplace listings, (3) Sync pipeline untested in prod (LOW — deferred to Phase 2).

Amazon Bedrock AgentCore (GA Oct 2025) offers managed Runtime, Gateway, Memory, Identity, Observability, and Policy services that directly address remaining weaknesses — particularly around infrastructure resilience, tool authentication, memory management, and monitoring.

Funding Intelligence Update (2026-02-21): The AI-powered FundingApplicationWriterService (5 AWS Partner Funding programs, KB-grounded generation, dry-run reviewer evaluation, Partner Central Benefits API) is a significant IP asset operating outside the CapabilityRegistry. Integrating it as capability #38 unlocks agent-orchestrated funding workflows, Gateway discoverability, and outcome-driven approval rate intelligence — a new data moat with strong network effects.

Recommendation: Adopt a hybrid approach — offload infrastructure-layer concerns (runtime, memory, identity, observability) to Bedrock AgentCore while retaining ownership of business logic (workflow planning, brand voice, marketplace metering, GTM templates, funding intelligence).

Current Architecture Scorecard¶

Category	Score (Initial → Current)	Key Issues
Architecture	7/10 → 8/10	Clean capability-based design, good separation of concerns. Gateway foundation infrastructure added.
Security	4/10 → 8/10	~~IDOR, prompt injection, missing auth checks, unsafe handler loading~~ — All 5 critical fixes applied (SEC-001–005). ~~Mass assignment, inconsistent auth, credential cache, unsafe JSON parsing~~ — All 4 high fixes applied (SEC-006–009).
Scalability	3/10 → 4/10	~~O(n³) resolution~~ replaced with Kahn's algorithm. Remaining: No concurrency limits, memory leaks, no backpressure
Resilience	5/10	Good stuck detection, but no circuit breaker, no transactions, retry storms
Observability	6/10 → 7/10	~~No ops alerting~~ — CloudWatch alarms, AgentCore log group, X-Ray sampling added. Remaining: No correlation IDs, no OTEL
Error Handling	7/10	Excellent error classification, but too many silent failures

Security Findings¶

CRITICAL¶

SEC-001: IDOR in getExecutionStatus / getExecutionResults — FIXED
File: AgentOrchestrator.php:377-438, AgentPolicy.php, AgentController.php
Issue: Public methods retrieve execution data by ID with no user ownership check. Any authenticated user can access another user's workflow results, task descriptions, and generated assets.
Fix applied: Defense-in-depth across 3 layers:
1. Added execute(), viewExecution(), and cancel() methods to AgentPolicy
2. Added $userId parameter + ownership guard to getExecutionStatus(), getExecutionResults(), and cancelExecution() in AgentOrchestrator
3. Updated AgentController to use dedicated policy methods and pass Auth::id() to orchestrator
SEC-002: Unsafe Capability Handler Loading — FIXED
File: CapabilityRegistry.php:68-77
Issue: getHandler() instantiates handler classes without validating against an allow-list. If an attacker can create a capability record with a malicious handler_class, this is a path to remote code execution.
Fix applied: Added ALLOWED_HANDLER_NAMESPACES constant restricting to App\Extensions\ContentManager\System\Services\Capabilities\. Validates namespace prefix, class existence, and CapabilityInterface implementation before instantiation. Critical-level logging on blocked attempts.
SEC-003: Prompt Injection in WorkflowPlanner — FIXED
File: WorkflowPlanner.php:161-265
Issue: User-provided $taskDescription is interpolated directly into the planning prompt with no escaping or boundary markers. Crafted input could manipulate the planner.
Fix applied (multi-layer):
1. Input sanitization: strip control characters, limit to 2,000 chars
2. XML boundary markers: <system_instructions> wraps planner prompt, <user_task> wraps user input
3. Output validation: max 50 steps, required fields per step (capability, parameters, depends_on)
4. CFT defense-in-depth: Bedrock Guardrails PROMPT_ATTACK filter enabled (Prod: HIGH/NONE, Enterprise: HIGH/NONE — Bedrock requires OutputStrength=NONE for PROMPT_ATTACK)
SEC-004: Missing Sort Order Validation — FIXED
File: AgentController.php:287-296
Issue: $sortOrder parameter is passed directly to orderBy() without validation. Only $sortBy is whitelisted.
Fix applied: Added if (!in_array(strtolower($sortOrder), ['asc', 'desc'])) { $sortOrder = 'desc'; } — mirrors existing $sortBy whitelist pattern.
SEC-005: Potential SSRF in enrichBrandVoice — FIXED
File: AgentController.php:940-945
Issue: Company website URL used without validating scheme or blocking internal addresses.
Fix applied: Created reusable App\Services\UrlValidator utility. Enforces HTTPS-only, blocks RFC 1918 (10.x, 172.16-31.x, 192.168.x), link-local/AWS metadata (169.254.x), carrier-grade NAT (100.64.x), loopback (127.x), and IPv6 equivalents. DNS resolution check prevents DNS rebinding. Applied at enrichBrandVoice() endpoint.

HIGH¶

SEC-006: Mass Assignment Risk — FIXED
File: AgentController.php:735-753
Issue: $agent->update($validated) passes full validated array without explicit field whitelisting. The Agent model's $fillable includes sensitive fields (user_id, team_id, guardrail_id, bedrock_agent_id, ai_model, settings) that should not be updatable via this endpoint.
Fix applied: Replaced $agent->update($validated) with $agent->update($request->only(['name', 'description', 'is_active', 'capabilities'])) — explicit field whitelist prevents mass assignment of sensitive model attributes regardless of validation rules.
SEC-007: Inconsistent Authorization Patterns — FIXED
Files: AgentPolicy.php, AgentController.php
Issue: Mix of policy-based auth ($this->authorize()) and inline ownership checks ($execution->user_id !== Auth::id()). AgentPolicy missing create() method and execution-level policy methods for delete, rerun, restart, archive, restore, and notification operations.
Fix applied (two layers):
1. Added 6 new policy methods to AgentPolicy: create(), deleteExecution(), rerunExecution(), restartExecution(), archiveExecution(), toggleNotification()
2. Replaced all 9 inline $execution->user_id !== Auth::id() checks in AgentController with $this->authorize() calls using the appropriate policy method
3. All 17 authorization points in AgentController now consistently use policy-based auth
SEC-008: Static Credential Cache Without TTL — FIXED
File: BedrockRuntimeService.php:24-80
Issue: Shared Bedrock client cached in static variable with no expiration. In queue workers (long-lived processes), stale or revoked credentials continue to be used indefinitely.
Fix applied:
1. Added $sharedClientCreatedAt timestamp tracking to static cache
2. Added CACHE_TTL_SECONDS = 900 (15-minute) TTL constant
3. Cache validity now requires both credential hash match AND TTL not expired
4. Added resetClientCache() static method for explicit invalidation when credentials are known to have changed
SEC-009: Unsafe JSON Plan Parsing — FIXED
File: WorkflowPlanner.php:330-455
Issue: Plan JSON decoded with minimal structural validation. No validation of step fields, types, or allowed keys. Deeply nested payloads could cause memory issues.
Fix applied (5 validations):
1. JSON decode depth limit (10 levels) prevents stack exhaustion
2. ALLOWED_STEP_KEYS whitelist strips arbitrary keys from step objects (capability, parameters, depends_on, name, description, step_id, condition, retry, timeout)
3. Capability slug format validation via regex (/^[a-zA-Z][a-zA-Z0-9_]{0,99}$/)
4. depends_on values validated as valid step indices (non-negative, within range, no self-reference)
5. Parameters nesting depth capped at 5 levels via recursive arrayDepth() check

Scalability Findings¶

CRITICAL¶

SCALE-001: No Concurrent Execution Limits
File: AgentController.php:385-391
Issue: Every execution dispatches to queue immediately with zero rate limiting. 100 concurrent users = 100 simultaneous Bedrock API calls = AWS throttling → retry storm → cascading failure.
Fix: Add per-user concurrency cap (3-5), implement queue-level backpressure.
SCALE-002: O(n³) Dependency Resolution — FIXED
File: WorkflowPlanner.php:491-517
Issue: Execution order algorithm uses nested loops with maxIterations = count($steps)². A 50-step workflow = 312,500+ operations.
Fix applied: Replaced with Kahn's algorithm using SplQueue for BFS traversal and $inDegree[] array for tracking (O(V+E)). Uses associative array for O(1) lookup instead of in_array(). Preserved existing diagnostic error handling for unresolvable dependencies. Pre-validation via existing detectCircularDependencies() DFS retained.
SCALE-003: Unbounded Memory in Multi-Step Workflows
File: AgentOrchestrator.php:237-309
Issue: $stepOutputs and $results arrays accumulate all capability outputs in memory. A 20-step image workflow could exceed PHP's 128MB limit.
Fix: Store step outputs in DB/cache, load on-demand for downstream dependencies only.

HIGH¶

SCALE-004: Oversized Response Bodies
File: AgentOrchestrator.php:465-484
Issue: getExecutionResults() returns entire step_results and generated_assets with no pagination.
Fix: Add pagination, or return summary with on-demand detail endpoints.
SCALE-005: N+1 Query Patterns
File: AgentController.php:43-47, 313-315
Issue: Agent lists load without pagination (->get() instead of ->paginate()). Execution results load full JSON blobs without field selection.
Fix: Add pagination, use ->select() for list endpoints.

Resilience Findings¶

CRITICAL¶

RES-001: No Circuit Breaker for Bedrock
File: BedrockRuntimeService.php
Issue: When Bedrock is down or throttled, every request fails independently. No consecutive-failure tracking, no circuit opening, no adaptive backoff, no fallback to alternative models.
Impact: Complete workflow unavailability during any Bedrock outage.
Fix: Implement circuit breaker with states (closed → open → half-open), track failure rate, exponential backoff on 429/503.
RES-002: No Database Transactions on Step Execution
File: ProcessAgentWorkflowJob.php:67-128
Issue: Step results written to DB without transaction wrapping. Job crash leaves execution in ambiguous state with partial results.
Fix: Wrap step execution in DB::transaction(), implement idempotency keys.
RES-003: Retry Storm Risk
File: ProcessAgentWorkflowJob.php:27-32
Issue: 3 retries with no exponential backoff configured. All retries fire immediately, compounding AWS rate limiting.
Fix: Add backoff() method returning [30, 120, 300] (escalating delays).

HIGH¶

RES-004: Silent Service Degradation
Files: AgentOrchestrator.php:33-40, AgentMemoryService.php:135-143
Issue: Multiple services fail silently at DEBUG/WARNING log levels. Users have no visibility into whether memory, dry-run, or other optional services are functioning.
Fix: Add execution-level services_status field, surface degradation in UI.

STRENGTHS (Keep)¶

Stuck Execution Recovery — ExecutionHealthMonitor with three-level escalation is excellent
Error Classification — ExecutionErrorAnalyzer categorizes errors with user-friendly guidance
Self-Healing Config — Configurable thresholds at global/team/user scope
Health Event Audit Trail — Full audit via ExecutionHealthEvent model

Observability Findings¶

MEDIUM¶

OBS-001: No Correlation IDs
Issue: Logs lack end-to-end request/execution correlation IDs. Tracing a failure across orchestrator → planner → Bedrock → capability requires manual log correlation.
Fix: Generate UUID at execution start, propagate through all service calls.
OBS-002: Incomplete Alerting — PARTIALLY ADDRESSED
Issue: ExecutionHealthMonitor sends in-app notifications but has no integration with CloudWatch, PagerDuty, Slack, or other ops monitoring.
Fix applied (infrastructure layer): Added CloudWatch metric filters for agent_execution_failed and Bedrock throttles. Added alarms: AgentFailureRateAlarm (>10 failures/5-min) and BedrockThrottleAlarm (>20 throttles/5-min) with SNS integration. Agent endpoint WAF rate limiting alarm added.
Remaining: Application-level CloudWatch metric emission, Slack/PagerDuty integration.
OBS-003: No Distributed Tracing — PARTIALLY ADDRESSED
Issue: No OpenTelemetry or X-Ray integration. Cannot trace latency across Bedrock API calls.
Fix applied (infrastructure layer): Added X-Ray sampling rule for AgentCore executions at 100% (Priority 50) for */agent* paths. Dedicated log group /vell/{env}/agentcore with 365-day retention. Lambda sync function has TracingConfig: Active.
Remaining: Application-level OTEL instrumentation in AgentOrchestrator and BedrockRuntimeService.

STRENGTHS (Keep)¶

Analytics Service — AgentAnalyticsService with P50/P95/P99 percentiles and health scoring
Detailed Logging — 17+ structured log calls in AgentOrchestrator alone
Usage Tracking — Token/credit metering per step for billing

Failed Job Analysis¶

Current Infrastructure¶

Your application has failure tracking infrastructure in place:

Component	Status	Details
`failed_jobs` table	Exists	Standard Laravel DLQ — captures job payload + exception trace
`agent_executions.status`	Exists	Tracks: pending, planning, executing, completed, failed, cancelled, archived
`agent_executions.error_message`	Exists	Stores failure details
Self-healing fields	Exists	`retry_count`, `max_retries`, `retry_history` (JSON), `is_stuck`, `health_status`
`execution_health_events`	Exists	Full audit trail: stuck_detected, auto_restart, recovery, rate_limited, max_retries_reached
`MonitorExecutionHealth` command	Runs every minute	Detects stuck, auto-restarts, rate-limits
`MonitorAgentWorker` command	Available	Detects orphaned/stale jobs, can rescue

What's Missing¶

Gap	Impact	Priority
No admin dashboard for failed_jobs	Failures only visible in DB/logs — you can't see them	HIGH
No failure rate SLA tracking	No alerting when failure rate exceeds threshold	HIGH
No Slack/PagerDuty alerting	Admin notifications are in-app only	MEDIUM
No failed_jobs API endpoint	Cannot query/retry failed jobs from UI	MEDIUM
No poison pill detection	Repeated failures from same input not detected	LOW
No DLQ depth monitoring	Queue backlog invisible until users complain	MEDIUM

Job Retry Configuration¶

ProcessAgentWorkflowJob:
  tries: 3           (max attempts)
  timeout: 600       (10 minutes per attempt)
  backoff: none      (immediate retry — PROBLEM)

Self-Healing:
  stuck_warning: 5 min
  stuck_critical: 30 min
  auto_restart: 60 min
  max_auto_retries: 3
  exponential_backoff: true (for auto-restarts only)
  max_executions_per_hour: 100
  max_executions_per_day: 1000

Bedrock AgentCore Service Comparison¶

Service-by-Service Mapping¶

Bedrock AgentCore Service	Your Current Implementation	Gap Analysis
Runtime (serverless agent hosting, microVM isolation, 8hr execution windows)	`ProcessAgentWorkflowJob` on Laravel queue + Redis	Your queue has no session isolation, 10min timeout, no microVM sandboxing
Gateway (MCP tool registry, API→MCP transform, semantic tool selection, auth management)	`CapabilityRegistry` + inline handler loading + DynamoDB tool registry + S3 MCP schemas (Phase 0C)	~~Unsafe handler loading~~ fixed (SEC-002). Gateway foundation deployed: DynamoDB registry, S3 MCP schemas, Lambda sync, EventBridge pipeline. Remaining: No semantic discovery, no auth management, no live Gateway registration (Phase 2). Gap: `FundingApplicationWriterService` operates as a standalone service outside `CapabilityRegistry` — not registered in DynamoDB tool registry, not discoverable via Gateway, not available to agent workflows.
Memory (short-term + long-term + episodic memory)	`AgentMemoryService` wrapping Bedrock agent memory	Already using Bedrock memory partially; local fallback is basic session tracking
Identity (agent identity, OAuth flows, credential management)	Mix of Passport, Cognito, ~~inline API key checks~~ standardized policy (SEC-007)	Fragmented; no unified agent identity; ~~static credential cache~~ fixed with TTL (SEC-008)
Observability (CloudWatch dashboards, OTEL, latency/error/token metrics)	`AgentAnalyticsService` + `ExecutionHealthMonitor` + custom logging + CloudWatch alarms/metrics (Phase 0B)	Good analytics. ~~No CloudWatch integration~~ — metric filters, alarms, X-Ray sampling added. Remaining: No OTEL instrumentation in application code, no distributed tracing end-to-end
Policy (Cedar-based tool-call interception, natural language policy rules)	`AgentPolicy` + inline auth checks	Minimal; no tool-call-level policy enforcement
Evaluations (13 built-in evaluators, custom scoring, continuous monitoring)	None	No quality evaluation system at all
Browser / Code Interpreter (secure browser runtime, code execution sandbox)	None	Not applicable to current use case

Bedrock AgentCore Pricing Reference¶

Service	Metric	Price
Runtime	vCPU-hour	$0.0895
Runtime	GB-hour (memory)	$0.00945
Gateway	Per 1,000 tool invocations	$0.005
Gateway	Per 1,000 search queries	$0.025
Gateway	Per 100 tools indexed/month	$0.02
Memory (short-term)	Per 1,000 events	$0.25
Memory (long-term)	Per 1,000 memories stored	$0.75
Memory (retrieval)	Per 1,000 retrievals	$0.50
Identity	Per 1,000 requests	$0.01 (free via Runtime/Gateway)
Policy	Per 1,000 auth requests	~$0.025

Offload vs Keep Recommendations¶

OFFLOAD to Bedrock AgentCore¶

Component	Why Offload	Bedrock Service	Resolves Issues
Agent Runtime / Execution	Your queue has no isolation, 10min timeout, no backpressure. Bedrock provides microVM isolation, 8hr windows, auto-scaling, and consumption-based billing (only charged for active CPU).	AgentCore Runtime	RES-002, RES-003, SCALE-001
Tool Registry & Authentication	~~`CapabilityRegistry` has unsafe handler loading (SEC-002)~~ — fixed. ~~Static credential cache (SEC-008)~~ — fixed with TTL. ~~Inconsistent auth (SEC-007)~~ — fixed with standardized policy. DynamoDB tool registry + S3 MCP schemas deployed (Phase 0C). Gateway provides managed tool registry with semantic discovery, inbound/outbound auth, and 1-click connectors (Salesforce, Slack, Jira, etc).	AgentCore Gateway	~~SEC-002~~, ~~SEC-008~~, ~~SEC-007~~
Memory Management	`AgentMemoryService` already wraps Bedrock memory but with basic local fallback. Native AgentCore Memory adds episodic memory, managed short-term/long-term storage, and eliminates the need for your fallback code.	AgentCore Memory	RES-004 (memory degradation)
Observability & Monitoring	No OTEL, no CloudWatch integration, no distributed tracing. AgentCore Observability provides turnkey dashboards, OTEL compatibility, and integration with Datadog/Dynatrace/LangSmith.	AgentCore Observability	OBS-001, OBS-002, OBS-003
Agent Identity & Auth	Fragmented auth (Passport + Cognito + ~~inline checks~~ standardized policy). ~~Static credential cache (SEC-008)~~ — fixed with TTL. AgentCore Identity provides unified agent identity with OAuth flows, token management, and multi-tenant support.	AgentCore Identity	~~SEC-007~~, ~~SEC-008~~
Tool-Call Policy Enforcement	No tool-call-level authorization. AgentCore Policy intercepts every tool call in real-time with Cedar policies written in natural language.	AgentCore Policy	SEC-007

KEEP In-House (Competitive Advantages)¶

Component	Why Keep	Current Quality	Notes
WorkflowPlanner (Claude-powered planning)	Core IP — your GTM workflow templates, brand-aware planning prompts, and capability-aware step generation are unique. Bedrock AgentCore has no equivalent planning service.	Good — ~~SEC-003, SCALE-002~~ fixed	Prompt injection fixed (XML boundaries + input sanitization), O(n³) sort replaced with Kahn's algorithm. Planning logic retained.
BrandVoiceContextBuilder	Product differentiator — enriches every workflow with company tone, audience, industry context. No managed equivalent exists.	Excellent	Keep and enhance
ExecutionErrorAnalyzer	User-facing error classification with actionable recommendations. AgentCore Observability doesn't provide this UX layer.	Excellent	Keep as UI layer on top of AgentCore Observability
ExecutionHealthMonitor (self-healing)	Sophisticated stuck detection with three-level escalation. While AgentCore Runtime handles infrastructure-level health, your business-logic-level health monitoring (stuck workflows, rate limiting) is valuable.	Excellent	Adapt to monitor AgentCore Runtime sessions instead of queue jobs
SelfHealingConfig (per-user thresholds)	Unique multi-scope configuration (global/team/user). No managed equivalent.	Good	Keep for business-level config
AgentAnalyticsService	Business-level metrics (success rate, P50/P95/P99, cost analysis, capability breakdown, health scoring). AgentCore Observability provides infrastructure metrics but not GTM-specific analytics.	Good	Keep, feed data from AgentCore Observability
FundingApplicationWriter (5 AWS programs)	Core IP — program-specific AI generation with KB grounding from official AWS docs, dry-run evaluation with reviewer personas, Partner Central Benefits API integration. No managed equivalent. Only standalone AI funding assistant for AWS Partner Funding programs.	Excellent	Keep — register as AgentCore capability (currently standalone, not in CapabilityRegistry). Wire outcome tracking for approval rate intelligence.
GTM Workflow Templates (10 pre-built)	Core product value — co-sell partner workflows, marketplace optimization, content generation pipelines.	Good	Keep and expand
Marketplace Metering	AWS Marketplace billing integration (credit/token accounting per step). Specific to your ISV business model.	Good	Keep; wire into AgentCore Runtime session metrics
Content Tag System	Content organization and tracking for GTM workflows. No managed equivalent.	Good	Keep

IP Impact Analysis¶

What You Own vs What You Rent¶

Offloading to Bedrock AgentCore shifts ownership of infrastructure but does NOT transfer your business logic IP. The key question is: which components contain defensible IP, and which are commodity infrastructure you're maintaining at cost?

IP Value Map¶

Component	Lines of Custom Logic	Replication Effort	IP Value	Migration Risk
Marketplace SEO Score (MSS)	1,070	6-8 weeks	VERY HIGH	NONE — stays in-house
Capability Registry (37 registered + 5 unregistered capabilities)	736	4-6 weeks	VERY HIGH	~~MEDIUM~~ LOW — Handler namespace allowlist (SEC-002) + DynamoDB local registry (IP-002/003) ensure handler IP stays in-house. Gateway hosts only schemas/descriptions.
FundingApplicationWriter (5 AWS programs, KB-grounded generation, dry-run evaluation)	~800+	4-6 weeks	VERY HIGH	NONE — stays in-house
Co-Sell Matching (ICP overlap, partner intelligence)	584	3-4 weeks	HIGH	NONE — stays in-house
WorkflowPlanner (dependency graph, retry heuristics)	572	3-4 weeks	VERY HIGH	NONE — stays in-house
Deal Influence Tracking (multi-touch attribution)	400+	4-5 weeks	VERY HIGH	NONE — stays in-house
BrandVoiceContextBuilder	349	2-3 weeks	HIGH	NONE — stays in-house
AgentAnalyticsService (P50/P95/P99, health scoring)	770	3 weeks	MEDIUM	LOW — sits on top of AgentCore Observability
Co-Sell Analytics	453	2 weeks	MEDIUM	NONE — stays in-house
Marketplace Metering	229	2-3 days	LOW	NONE — commodity wrapper

Total custom business logic: ~5,963+ lines across 10 components Full system replication effort: 6-12 months (including integration testing, data migration, domain expertise)

What Makes Your IP Defensible¶

1. Marketplace SEO Score (MSS) Algorithm — Patent-worthy - 3-component scoring: 40% Listing Quality, 25% Backlink Authority, 35% AI Visibility - AI Visibility scoring (tracking LLM mentions of listings) is genuinely novel - Bedrock fallback proxy when DataForSEO API is unavailable — clever engineering - Category benchmark medians improve with every customer scored (network effect)

2. WorkflowPlanner Intelligence - Temperature reduction on retry (0.3 → 0.1) to produce more deterministic replans - Context truncation strategy (2KB limit) prevents prompt bloat while preserving semantics - Capability-aware hints: detects available capabilities and adjusts planning prompts dynamically - Learns from execution errors to improve subsequent planning

3. AWS Partner Network-Specific Capabilities - ACE Opportunity Sync (auto-generates pre-filled ACE briefs) - CPPO Proposal Generator (pricing proposals for AWS Marketplace) - AWS Clean Rooms integration (privacy-preserving partner overlap analysis) - Partner Intelligence scoring (relationship strength + warm intro paths) - These require AWS Partner Network access — competitors can't replicate without partnership agreements

4. AI-Powered Funding Application Intelligence - Program-specific prompt engineering for 5 AWS Partner Funding programs (Innovation Sandbox, POC, ISV WMP, MDF, MAP) - Knowledge Base grounding from official AWS documentation — responses cite real program requirements, not generic AI output - Dry-run evaluation with AI personas simulating AWS funding reviewers (funding_reviewer, funding_technical_reviewer) - Company profile enrichment pre-fills applications with existing brand/product data - Partner Central Benefits API integration creates a submission-to-outcome feedback loop (coming soon) - Approval rate data accumulates: more submissions → better program-specific guidance → higher success rates (network effect)

5. Deal Influence Attribution - 6 correlation input types: UTM, private offers, metering, CRM stages, email/calendar, content engagement - Multi-touch attribution modeling (first-touch vs last-touch) - Content-to-conversion lag time tracking — improves with more data

IP Risk from Migration¶

Migration Phase	IP Risk	Mitigation
Phase 1: Observability	ZERO	Only adds monitoring layer
Phase 2: Gateway	~~MEDIUM~~ LOW	Your 37 capability handlers become Gateway tools. Handler logic stays yours. Tool metadata mitigated: DynamoDB local registry is source of truth (IP-002), MCP schemas versioned in S3 (IP-003), handler namespace allowlist prevents unauthorized loading (SEC-002).
Phase 3: Memory	LOW	Memory content moves to AWS-managed storage. Session metadata stays in your DB. You lose direct access to raw memory vectors.
Phase 4: Runtime	LOW	Your orchestrator code runs on AgentCore compute but remains your code. Similar to deploying on EC2 — AWS runs it, you own it.
Phase 5: Identity	LOW	Auth config moves to AgentCore Identity. Credential mapping is operational, not IP.

Key IP Protection Actions¶

IP-001: Document MSS algorithm separately — consider provisional patent filing
IP-002: Keep capability handler source code in your repo (Gateway only hosts tool schemas/descriptions) — ADDRESSED: Handler source stays in App\Extensions\ContentManager\System\Services\Capabilities\. DynamoDB registry + S3 schemas store only metadata/descriptions, never handler code. ALLOWED_HANDLER_NAMESPACES constant enforces this boundary.
IP-003: Maintain local copies of all tool metadata registered with Gateway — ADDRESSED: DynamoDB vell-{env}-gateway-tool-registry is the local source of truth. php artisan agentcore:sync-tools seeds from application DB. Versioned snapshots in DynamoDB + versioned S3 schemas provide full history. Lambda sync is event-driven, not Gateway-dependent.
IP-004: Export Bedrock Agent Memory sessions periodically to your own S3 bucket
IP-005: Ensure all AgentCore Policy rules are version-controlled in your repo, not only in AWS console

Data Moat & Lock-In Assessment¶

Data That Accumulates Value Over Time¶

Data Category	Stickiness	Portability	Network Effect	Flywheel
Marketplace SEO benchmark data	VERY HIGH	Low (AWS-specific)	VERY STRONG	More listings → better category medians → better recommendations
Deal influence correlation data	VERY HIGH	Medium	STRONG	More customers → better attribution models → more predictive
Execution retry/health patterns	VERY HIGH	High (your DB)	MEDIUM	More executions → better self-healing → fewer failures
Keyword gap intelligence	VERY HIGH	Low (AWS-specific)	VERY STRONG	More listings → better competitive landscape maps
Funding application outcomes	VERY HIGH	Medium (your DB)	VERY STRONG	More submissions → approval/rejection patterns → better program-specific guidance → higher success rates
Funding program intelligence	HIGH	Low (AWS-specific)	STRONG	KB-grounded program requirements + reviewer persona feedback accumulate institutional knowledge of what gets approved
Brand voice profiles	HIGH	High (JSON export)	MEDIUM	Every execution refines "what context works"
Knowledge base content	HIGH	Medium (S3 docs portable, embeddings not)	HIGH	Quality improves with document volume
Compliance history	VERY HIGH	Medium	HIGH	Full audit trail creates switching cost
Partner matching patterns	HIGH	Medium	STRONG	More partners matched → better success predictors
Agent memory (Bedrock)	MEDIUM	LOW (Bedrock-specific)	HIGH	Conversation history compounds
Team configuration	MEDIUM	High (portable)	LOW	Organizational inertia

What You LOSE Control Of With AgentCore Migration¶

Fully Lost (Bedrock-hosted, no direct access): 1. Active Bedrock Agent Memory content (session summaries, semantic facts) 2. Fine-tuned model weights (bedrock_model_arn is AWS-hosted) 3. Knowledge Base embeddings (Bedrock-specific vectors)

Stays in Your Database (Portable): 1. All execution history, retry patterns, health events 2. Brand voice profiles, GTM goals, personas 3. Compliance reports and rule history 4. Marketplace metrics, SEO scores, keyword gaps 5. Agent definitions, capability configurations 6. Memory session metadata (just not the memory content itself) 7. All integration credentials (encrypted) 8. Deal influence correlation data 9. Team/org configuration

Net Impact: You retain ~90% of your data moat. The 10% you lose (active memory, embeddings) is operational state, not strategic data.

Customer Switching Costs Created¶

Feature	What Customer Loses by Leaving	Lock-In Strength
Marketplace SEO Score trends	Historical score trajectory, benchmark comparisons	HIGH
Deal influence models	Years of content-to-conversion correlation data	VERY HIGH
Compliance audit trail	Full validation history, rule evolution	HIGH
Brand voice configuration	Tuned personas, GTM positioning, competitive differentiators	MEDIUM
Workflow execution history	What worked, what failed, optimization patterns	MEDIUM
Knowledge bases	Curated document corpus with per-capability tuning	MEDIUM
Partner matching history	Relationship strength scores, ICP overlap data	MEDIUM
Funding application history	Approval/rejection patterns, program-specific guidance, reviewer feedback, reusable application templates	HIGH

Opportunities¶

Opportunity 1: Marketplace Intelligence as a Standalone Product¶

Your MSS algorithm, keyword gap analysis, and competitive benchmarking could be offered as a standalone analytics product for AWS Marketplace ISVs — even those not using your full GTM platform.

Market size: 3,000+ ISV listings on AWS Marketplace
Moat: Benchmark data improves with every customer (network effect)
Revenue model: Tiered pricing by listing count
AgentCore relevance: Gateway enables this as a standalone MCP tool that other agents can invoke

Opportunity 2: AgentCore Gateway as Distribution Channel¶

By registering your 37 capabilities as Gateway tools with semantic discovery, your capabilities become discoverable by any agent connected to Gateway — not just your own UI.

Your co-sell matching, SEO scoring, content generation, and funding application generation become tools other frameworks (CrewAI, LangGraph, LlamaIndex) can invoke
Gateway's 1-click connectors (Salesforce, Slack, Jira) replace your custom integration code
This shifts your business model from "app you log into" to "capabilities any agent can call"
Funding-specific opportunity: A partner's agent discovers your funding_application_writer tool via Gateway, generates a joint POC funding application combining both partners' data, and submits via Partner Central Benefits API — fully automated, agent-to-agent co-funding
Phase 0C progress: DynamoDB tool registry seeded (37 capabilities, 9 listings), 37 MCP schemas in S3, EventBridge sync pipeline deployed, agentcore:sync-tools runs automatically on every deploy. Phase 2 will register tools with live Gateway. Action needed: Register FundingApplicationWriterService as capability #38 before Phase 2 Gateway registration.

Opportunity 3: A2A Protocol for Multi-Agent GTM¶

AgentCore Runtime supports the Agent-to-Agent (A2A) protocol. Your specialized agents (SEO analyzer, co-sell matcher, content generator) could communicate with:

Customer's own internal agents
Partner agents (ISV-to-ISV collaboration)
AWS first-party agents (Marketplace listing optimizer)

This enables agent-mediated co-sell — a partner's agent negotiates joint GTM campaigns with your agent automatically.

Opportunity 4: AgentCore Evaluations for Quality Differentiation¶

You currently have zero quality evaluation for agent outputs. AgentCore Evaluations provides 13 built-in evaluators. Adding quality scoring to every workflow creates:

Customer-visible quality grades (trust signal)
Continuous quality monitoring (catch regressions)
A/B testing of prompt strategies
Data for fine-tuning (reward signal)

High-value use case — Funding application quality scoring: Your FundingApplicationWriter already has dry-run evaluation with funding_reviewer and funding_technical_reviewer personas. AgentCore Evaluations could formalize this into continuous scoring: grade applications before submission, track score correlation with actual approval outcomes, and use approval data as a reward signal to improve generation quality over time. This is a natural fit — you already have the persona simulation infrastructure, Evaluations adds the scoring framework and regression detection.

Opportunity 5: Convert Self-Healing Into a Feature¶

Your ExecutionHealthMonitor + SelfHealingConfig + ExecutionHealthEvent stack is genuinely sophisticated. Most SaaS apps don't expose this.

Surface health dashboards to customers ("Your agents are 94% healthy")
Let customers tune their own self-healing thresholds
Create "reliability SLAs" as a premium tier feature
Market as "Enterprise-grade agent reliability" differentiator

Opportunity 7: AI-Powered Funding Intelligence Platform¶

Your FundingApplicationWriterService already generates applications for 5 AWS Partner Funding programs with KB grounding and dry-run evaluation. This is a significant capability that's not yet integrated into the AgentCore orchestration system — it exists as a standalone service outside the CapabilityRegistry.

Immediate integration gap: The funding writer is not registered as one of the 37 CapabilityRegistry capabilities, meaning it's not in the DynamoDB tool registry, not discoverable via Gateway, and not available for agent-orchestrated workflows. Registering it unlocks:

Agent-orchestrated funding workflows: WorkflowPlanner can chain funding applications with co-sell matching, deal intelligence, and marketplace optimization into multi-step GTM sequences (e.g., "identify best co-sell partner → generate joint POC funding application → create co-branded content")
Outcome-driven intelligence: Once Partner Central Benefits API submission goes live, track approval/rejection outcomes per program. This creates a feedback loop: more submissions → pattern recognition on what gets approved → better program-specific guidance → higher success rates. This data is a defensible moat.
Standalone product potential: Similar to Opportunity 1 (Marketplace Intelligence), funding application assistance could be offered standalone to the 3,000+ AWS Marketplace ISVs who need help navigating AWS Partner Funding programs but don't use your full platform
Revenue model: Tiered by program complexity — free Innovation Sandbox applications (lead gen), paid for POC/MAP/MDF applications with outcome tracking
AgentCore relevance: Register as Gateway tool so other agents (partner agents, AWS first-party agents) can request funding applications via A2A protocol — enables agent-mediated joint funding applications between ISV partners

Action items: - [ ] Register funding_application_writer as a CapabilityRegistry capability (adds to DynamoDB tool registry + S3 MCP schema) - [ ] Complete Partner Central Benefits API direct submission (currently "coming soon") - [ ] Add funding_applications table to track submissions, outcomes, and program-specific patterns - [ ] Build approval rate analytics that feed back into generation prompts (reward signal) - [ ] Add funding_reviewer dry-run personas to AgentCore Evaluations (Opportunity 4) when adopted

Opportunity 6: Patent Filing¶

Four components are novel enough for provisional patent applications:

MSS Algorithm — 3-component marketplace SEO scoring with AI Visibility tracking and Bedrock fallback proxy
Self-Healing Workflow Orchestration — Three-tier stuck detection with exponential backoff auto-restart and per-scope configuration
Capability-Aware Workflow Planning — Dynamic hint injection based on available capability set with temperature-reducing retry strategy
AI-Powered Funding Application Intelligence — Program-specific KB-grounded application generation with simulated reviewer evaluation and outcome-driven feedback loop

Migration Roadmap¶

Phase 0: Critical Fixes + Gateway Foundation (Week 1-3) — COMPLETE & DEPLOYED¶

Done regardless of migration decision. Prepares infrastructure for Phase 2 Gateway migration. Deployed: 2026-02-20 — All 4 stacks healthy in us-east-1 (account 253265132499).

Phase 0A: Application Security Fixes — COMPLETE¶

Critical (SEC-001–005): - [x] SEC-004: Sort order validation whitelist in AgentController - [x] SEC-001: IDOR ownership checks — defense-in-depth across AgentPolicy, AgentOrchestrator, AgentController - [x] SEC-002: Handler namespace allowlist + CapabilityInterface validation in CapabilityRegistry - [x] SEC-005: SSRF protection via App\Services\UrlValidator (RFC 1918, metadata endpoint, loopback blocking) - [x] SEC-003: Prompt injection boundary markers + input sanitization + output validation in WorkflowPlanner - [x] SCALE-002: Kahn's algorithm topological sort replacing O(n³) dependency resolution in WorkflowPlanner

High (SEC-006–009): - [x] SEC-006: Explicit field whitelist ($request->only()) replacing $validated mass assignment in AgentController::update() - [x] SEC-007: Standardized policy-based auth — 6 new AgentPolicy methods, 9 inline checks replaced with $this->authorize() in AgentController - [x] SEC-008: TTL-based credential cache (15-min expiry) + resetClientCache() method in BedrockRuntimeService - [x] SEC-009: JSON schema validation in WorkflowPlanner::extractJsonFromResponse() — decode depth limit, step key whitelist, capability slug regex, dependency index validation, parameter depth cap

Phase 0B: CloudFormation Hardening — COMPLETE & DEPLOYED¶

WAF: Agent endpoint rate limiting (100 req/5-min per IP) for /agent, /execute, /workflow paths — vell-api-waf.yaml
Stack: prod-api-waf — UPDATE_COMPLETE (2026-02-20)
Observability: AgentCore log group (/vell/{env}/agentcore), metric filters (execution failures, Bedrock throttles), alarms (>10 failures/5-min, >20 throttles/5-min), X-Ray 100% sampling for agent paths — vell-observability.yaml
Stack: prod-observability — CREATE_COMPLETE (2026-02-20)
Bedrock Guardrails: PROMPT_ATTACK filter enabled on Prod (HIGH/NONE) and Enterprise (HIGH/NONE) tiers — bedrock-guardrails.yml (Note: Bedrock requires OutputStrength=NONE for PROMPT_ATTACK filter type)
Stack: vellocity-bedrock-guardrails — UPDATE_COMPLETE (2026-02-20)
IAM Role: Added AgentCore Gateway (ListTools, GetTool, InvokeTool, SearchTools), Observability (GetAgentCoreMetrics, ListAgentCoreTraces), and Memory (GetMemory, PutMemory, DeleteMemory) permissions — vell-agentcore-bedrock-role.yaml
Customer-facing template — not deployed in Vell's account (customers deploy in their own accounts)

Phase 0C: AgentCore Gateway Foundation — COMPLETE & DEPLOYED¶

New CFT: cloudformation/application/vell-agentcore-gateway.yaml
Stack: prod-agentcore-gateway — CREATE_COMPLETE (2026-02-20)
DynamoDB table vell-{env}-gateway-tool-registry with category-index and marketplace-listing-index GSIs, PITR in prod
S3 bucket vell-{env}-gateway-tool-schemas with versioning, Glacier lifecycle, HTTPS-only policy
SQS sync queue + DLQ (3 retries, 14-day DLQ retention)
EventBridge rule for CapabilityRegistryChanged, ToolSchemaUpdated, MarketplaceListingChanged events
Lambda sync function (Python 3.11) with SQS event source mapping
IAM role with DynamoDB, S3, Bedrock Gateway, SQS, X-Ray, and CloudWatch permissions
CloudWatch dashboard (Lambda invocations/errors/duration, DynamoDB read/write, SQS queue depth, S3 operations)
7 SSM parameters for cross-stack resource discovery
CloudWatch alarms for sync errors and DLQ depth
Artisan command: php artisan agentcore:sync-tools — SyncGatewayToolsCommand.php
Seeds 37 capabilities from CapabilityRegistry::bootstrapDefaults() into DynamoDB
Generates MCP-compatible tool schemas to S3 for each capability
Flags: --dry-run, --force, --capability={slug}, --listing={id}
Marketplace listing mapping for 8 future listings (bundle-first strategy)
Versioned snapshots in DynamoDB for tool metadata history

Phase 0 Files Changed/Created¶

Created: - app/Services/UrlValidator.php — Reusable SSRF protection utility - cloudformation/application/vell-agentcore-gateway.yaml — Gateway foundation CFT (DynamoDB, S3, SQS, EventBridge, Lambda, IAM, CloudWatch) - app/Extensions/ContentManager/System/Console/Commands/SyncGatewayToolsCommand.php — Artisan agentcore:sync-tools command

Modified: - app/Extensions/ContentManager/System/Policies/AgentPolicy.php — Added execute(), viewExecution(), cancel() methods (SEC-001); Added create(), deleteExecution(), rerunExecution(), restartExecution(), archiveExecution(), toggleNotification() methods (SEC-007) - app/Extensions/ContentManager/System/Services/AgentCore/AgentOrchestrator.php — Ownership guards on 3 methods - app/Extensions/ContentManager/System/Services/AgentCore/CapabilityRegistry.php — Handler namespace allowlist + interface validation - app/Extensions/ContentManager/System/Services/AgentCore/WorkflowPlanner.php — Prompt injection protection + Kahn's algorithm (SEC-003/SCALE-002); JSON schema validation with step key whitelist, capability slug regex, dependency index validation, parameter depth cap, decode depth limit (SEC-009) - app/Extensions/ContentManager/System/Http/Controllers/AgentController.php — Sort order validation, SSRF protection, policy method updates (SEC-001/004/005); Explicit field whitelist for mass assignment (SEC-006); 9 inline auth checks replaced with policy-based $this->authorize() (SEC-007) - app/Services/Bedrock/BedrockRuntimeService.php — TTL-based credential cache (15-min expiry), resetClientCache() static method (SEC-008) - cloudformation/application/vell-api-waf.yaml — Agent endpoint rate limiting rule + alarm - cloudformation/application/vell-observability.yaml — AgentCore log group, metric filters, alarms, X-Ray sampling - cloudformation/application/bedrock-guardrails.yml — PROMPT_ATTACK filters on Prod/Enterprise - app/CustomExtensions/CloudMarketplace/resources/cloudformation/vell-agentcore-bedrock-role.yaml — Gateway, Observability, Memory IAM permissions

Modified (Infrastructure Health Check + Tool Registry Seeding 2026-02-20): - app/Extensions/ContentManager/System/ContentManagerServiceProvider.php — Registered SyncGatewayToolsCommand in registerCommands() (was missing — root cause of "no commands defined in agentcore namespace" error) - cloudformation/stacks/prod/prod-security.yml — Added logs:TagResource to app-ec2-perms policy (AWS requirement for CreateLogGroup with tags). Added dynamodb:PutItem/GetItem/Query on vell-{env}-gateway-tool-registry and s3:PutObject/GetObject on vell-{env}-gateway-tool-schemas-* for agentcore:sync-tools - vell/codedeploy/after-install.sh — Added agentcore:sync-tools to CodeDeploy AfterInstall hook (runs as $WEB_USER, non-fatal on failure) - app/Extensions/ContentManager/System/Console/Commands/SyncGatewayToolsCommand.php — Replaced app(DynamoDbClient::class) / app(S3Client::class) with direct instantiation using region config (no service container binding existed for AWS SDK clients)

Phase 0 Deployment Log (2026-02-20)¶

Deployment fixes applied during launch (3 template issues discovered and resolved):

bedrock-guardrails.yml — Enterprise PROMPT_ATTACK OutputStrength constraint
Issue: Enterprise tier had OutputStrength: MEDIUM for PROMPT_ATTACK filter
Error: Bedrock API rejected with "PROMPT ATTACK content filter strength for response must be NONE"
Fix: Changed to OutputStrength: NONE, added clarifying comment
Root cause: Bedrock enforces OutputStrength: NONE for PROMPT_ATTACK filter type (output-side prompt attack detection is not supported)
vell-agentcore-gateway.yaml — Lambda ZipFile size limit
Issue: Lambda inline code was 4,826 bytes, exceeding CloudFormation's 4,096-byte ZipFile limit
Error: PropertyValidation hook rejected changeset
Fix: Minified Lambda code from 4,826 to 2,843 bytes while preserving all functionality (handler(), sync_cap(), upd_schema(), upd_listing())
vell-agentcore-gateway.yaml — S3 lifecycle property name
Issue: Used NoncurrentVersionTransition (singular) with a list value
Error: PropertyValidation hook rejected changeset (cfn-lint: E3012)
Fix: Changed to NoncurrentVersionTransitions (plural) for list form

Final stack states:

Stack	Operation	Status	Timestamp (UTC)
`vellocity-bedrock-guardrails`	UPDATE	UPDATE_COMPLETE	2026-02-20T01:23:24
`prod-api-waf`	UPDATE	UPDATE_COMPLETE	2026-02-20T01:32:44
`prod-observability`	CREATE	CREATE_COMPLETE	2026-02-20T01:37:17
`prod-agentcore-gateway`	CREATE	CREATE_COMPLETE	2026-02-20T01:41:01

Post-deployment action: Re-subscribed ops@vell.ai to SNS vell-prod-critical-alerts topic (subscription pending email confirmation).

Tool Registry Seeding (2026-02-20):

4 blockers discovered and resolved across 3 CodeDeploy deployments:

Deployment	Commit	Fix	Result
`d-0WIFWHIXH`	`96a9d2f53`	(pre-fix baseline)	AfterInstall succeeded but `agentcore:sync-tools` not in hook
`d-7X2THSIXH`	`f42f46f44`	Service provider registration + IAM permissions + AfterInstall hook + SSM env vars (v49)	Command found but `app(DynamoDbClient::class)` → `BindingResolutionException`
`d-UK7H68JXH`	`ff5a7cc69`	Direct AWS SDK client instantiation (`new DynamoDbClient()`/`new S3Client()`)	SUCCESS — 37 capabilities synced

Final tool registry state: - DynamoDB: 111 items (37 latest + 74 versioned snapshots) - S3: 37 MCP tool schemas - 9 marketplace listings: brand-knowledge (5), competitive-intelligence (1), content-generation (5), cosell-partner-intelligence (6), deal-intelligence (3), gtm-planning (4), marketplace-intelligence (6), seo-intelligence (6), vell-platform (1)

Infrastructure Health Check (2026-02-19)¶

Verified all Phase 0 infrastructure from live AWS account 253265132499 in us-east-1.

Stack Status:

Stack	Status	Last Updated (UTC)
`prod-agentcore-gateway`	CREATE_COMPLETE	2026-02-20T01:41:01
`prod-observability`	CREATE_COMPLETE	2026-02-20T01:37:17
`prod-api-waf`	UPDATE_COMPLETE	2026-02-20T01:32:44
`vellocity-bedrock-guardrails`	UPDATE_COMPLETE	2026-02-20T01:23:24

Resource Verification:

Resource	Status	Details
DynamoDB `vell-prod-gateway-tool-registry`	ACTIVE	PITR enabled (35-day recovery). 2 GSIs active (`category-index`, `marketplace-listing-index`). 111 items — 37 capabilities × 3 (latest + 2 versioned snapshots from rolling deploy across 2 instances).
S3 `vell-prod-gateway-tool-schemas-253265132499`	ACTIVE	Versioning enabled. 37 MCP tool schemas uploaded across 9 marketplace listings.
Lambda `vell-prod-gateway-tool-sync`	Active	Python 3.11, 256MB, 60s timeout, X-Ray Active. SQS event source mapping enabled. 0 invocations since deployment.
SQS `vell-prod-gateway-sync`	ACTIVE	0 messages in-flight. Queue healthy.
SQS DLQ `vell-prod-gateway-sync-dlq`	ACTIVE	0 messages. No failed sync events.
EventBridge `vell-prod-gateway-tool-sync`	ENABLED	Listening for `CapabilityRegistryChanged`, `ToolSchemaUpdated`, `MarketplaceListingChanged` from `vell.agentcore`.
WAF `vell-prod-api-waf`	ACTIVE	7 rules (3 AWS managed + 3 rate limits + 1 count-mode burst detector). Associated with API Gateway stage `prod` (`qkxjis5iel`).
Bedrock Guardrails	ACTIVE	3 tiers: Dev (`uvishu7ijb29`), Prod (`5bf7khsguf6i`), Enterprise (`7s64nv00v8t5`).
CloudWatch Alarms	ALL OK	22 alarms verified — all in OK state including gateway-specific `vell-prod-gateway-sync-errors` and `vell-prod-gateway-sync-dlq-depth`.

Issues Found (ACTION REQUIRED):

Issue	Severity	Details	Action
INFRA-001: SNS subscriptions not confirmed	HIGH	Both `admin@vell.ai` and `ops@vell.ai` were `PendingConfirmation` on `vell-prod-critical-alerts`. CloudWatch alarms fire but no one receives alerts.	RESOLVED (2026-02-19). Both subscriptions confirmed: `ops@vell.ai` (`ccc72725-6796-4e86-95c3-cb5a6e822543`), `admin@vell.ai` (`7ce4e5c2-d039-44bd-ae1b-9d40071e1585`).
INFRA-002: Tool registry not seeded	MEDIUM	~~DynamoDB table has 0 items.~~	RESOLVED (2026-02-20). 4 blockers found and fixed across 3 deployments: (1) `SyncGatewayToolsCommand` not registered in `ContentManagerServiceProvider::registerCommands()`, (2) EC2 instance role missing `logs:TagResource` + DynamoDB/S3 permissions — `prod-security.yml` updated (stack UPDATE_COMPLETE 2026-02-20T03:05:26 UTC), (3) `AGENTCORE_GATEWAY_TABLE` and `AGENTCORE_GATEWAY_SCHEMA_BUCKET` env vars missing from SSM `/prod/app/.env` — added (version 49), (4) AWS SDK clients resolved via `app()` with no container binding — replaced with direct `new DynamoDbClient()`/`new S3Client()` instantiation. Result: 37 capabilities synced, 111 DynamoDB items (37 latest + 74 versioned snapshots), 37 MCP schemas in S3 across 9 marketplace listings. `agentcore:sync-tools` now runs automatically on every CodeDeploy AfterInstall.
INFRA-003: Sync pipeline untested in prod	LOW	Lambda has 0 invocations. The EventBridge → SQS → Lambda pipeline has never been triggered. Will work on first real event, but no smoke test has validated the end-to-end flow.	Publish a test `CapabilityRegistryChanged` event via EventBridge to validate the pipeline: `aws events put-events --entries '[{"Source":"vell.agentcore","DetailType":"CapabilityRegistryChanged","Detail":"{\"capability_slug\":\"test\",\"action\":\"test\"}"}]'`

Phase 1: Adopt AgentCore Observability (Week 4-5)¶

Lowest risk, highest immediate value. OBS-002/OBS-003 infrastructure layer done in Phase 0B; this phase adds application-level instrumentation.

Enable AgentCore Observability for existing Bedrock API calls
Add OTEL instrumentation to AgentOrchestrator and BedrockRuntimeService
Create CloudWatch dashboards for workflow health metrics
Set up CloudWatch Alarms → SNS → Slack/PagerDuty for failure rate thresholds
Add correlation IDs (execution UUID) to all service calls
Build admin dashboard view combining failed_jobs + agent_executions + AgentCore metrics

Phase 2: Migrate to AgentCore Gateway (Week 6-9)¶

Tool registry security fixed (SEC-002). DynamoDB registry + S3 MCP schemas deployed (Phase 0C). This phase registers tools with live Gateway.

Register capabilities with live AgentCore Gateway using MCP schemas from S3
Enable semantic tool selection for capability discovery
Migrate outbound auth (HubSpot, LinkedIn, Slack, etc.) to Gateway auth management
Implement Gateway interceptors for tool-call-level authorization
Transition CapabilityRegistry to read from Gateway (keep local DynamoDB as fallback)
Enable AgentCore Policy for tool-call enforcement
Configure marketplace listing unbundling (8 listings in DynamoDB, activate per listing)

Phase 3: Adopt AgentCore Memory (Week 9-10)¶

Replaces fragile fallback code

Migrate AgentMemoryService to native AgentCore Memory API
Enable episodic memory for GTM workflow learning
Remove local session fallback code
Add memory health status to execution UI

Phase 4: Migrate to AgentCore Runtime (Week 11-16)¶

Largest change — replaces queue-based execution

Package AgentOrchestrator as AgentCore Runtime-compatible agent
Implement session-based execution (replace ProcessAgentWorkflowJob)
Enable microVM isolation per execution
Extend timeout from 10min → up to 8hrs for complex workflows
Migrate ExecutionHealthMonitor to monitor Runtime sessions instead of queue jobs
Implement A2A protocol for multi-agent GTM workflows
Decommission ProcessAgentWorkflowJob and self-healing queue infrastructure

Phase 5: Adopt AgentCore Identity (Week 17-18)¶

Unifies fragmented auth

Configure AgentCore Identity with existing Cognito user pool
Migrate per-user Bedrock credentials to Identity-managed access
Enable custom claims for multi-tenant team isolation
Remove static credential cache from BedrockRuntimeService

Cost Estimation¶

Current Infrastructure Costs (Estimated)¶

Component	Cost Driver	Monthly Estimate
Queue workers (EC2/ECS)	Always-on compute for job processing	$200-500
Redis (ElastiCache)	Queue + cache	$100-300
Health monitoring (scheduler)	EC2 compute for cron	Included above
Developer time (maintenance)	Bug fixes, monitoring, on-call	40-80 hrs/month

Projected AgentCore Costs (at 10,000 executions/month)¶

Service	Usage	Monthly Cost
Runtime	10K sessions × ~18s active CPU × 1 vCPU + 60s × 2GB memory	~$8
Gateway	10K sessions × 5 tool calls = 50K invocations	~$0.25
Memory	50K short-term events + 5K long-term + 10K retrievals	~$21
Identity	Free via Runtime/Gateway	$0
Observability	CloudWatch log storage (~5GB)	~$3
Policy	50K tool calls × 1 policy check	~$1.25
Total AgentCore		~$33/month

Savings Analysis¶

Category	Before	After	Savings
Infrastructure compute	$300-800/mo	~$33/mo	$267-767/mo
Developer maintenance	40-80 hrs/mo	10-20 hrs/mo	30-60 hrs/mo
Security risk exposure	~~5 critical~~ 0 critical + ~~4 high~~ 0 high vulns (Phase 0A)	Addressed by managed services	Reduced attack surface
Incident response	Reactive (no alerting)	Proactive (CloudWatch alarms)	Faster MTTR

Note: Bedrock model inference costs (Claude tokens) remain the same regardless of migration. These estimates cover only the orchestration infrastructure layer.

AgentCore Gateway Audit: Build vs Buy Analysis¶

Table of Contents¶

Executive Summary¶

Current Architecture Scorecard¶

Security Findings¶

CRITICAL¶

HIGH¶

Scalability Findings¶

CRITICAL¶

HIGH¶

Resilience Findings¶

CRITICAL¶

HIGH¶

STRENGTHS (Keep)¶

Observability Findings¶

MEDIUM¶

STRENGTHS (Keep)¶

Failed Job Analysis¶

Current Infrastructure¶

What's Missing¶

Job Retry Configuration¶

Bedrock AgentCore Service Comparison¶

Service-by-Service Mapping¶

Bedrock AgentCore Pricing Reference¶

Offload vs Keep Recommendations¶

OFFLOAD to Bedrock AgentCore¶

KEEP In-House (Competitive Advantages)¶

IP Impact Analysis¶

What You Own vs What You Rent¶

IP Value Map¶

What Makes Your IP Defensible¶

IP Risk from Migration¶

Key IP Protection Actions¶

Data Moat & Lock-In Assessment¶

Data That Accumulates Value Over Time¶

What You LOSE Control Of With AgentCore Migration¶

Customer Switching Costs Created¶

Opportunities¶

Opportunity 1: Marketplace Intelligence as a Standalone Product¶

Opportunity 2: AgentCore Gateway as Distribution Channel¶

Opportunity 3: A2A Protocol for Multi-Agent GTM¶

Opportunity 4: AgentCore Evaluations for Quality Differentiation¶

Opportunity 5: Convert Self-Healing Into a Feature¶

Opportunity 7: AI-Powered Funding Intelligence Platform¶

Opportunity 6: Patent Filing¶

Migration Roadmap¶

Phase 0: Critical Fixes + Gateway Foundation (Week 1-3) — COMPLETE & DEPLOYED¶

Phase 0A: Application Security Fixes — COMPLETE¶

Phase 0B: CloudFormation Hardening — COMPLETE & DEPLOYED¶

Phase 0C: AgentCore Gateway Foundation — COMPLETE & DEPLOYED¶

Phase 0 Files Changed/Created¶

Phase 0 Deployment Log (2026-02-20)¶

Infrastructure Health Check (2026-02-19)¶

Phase 1: Adopt AgentCore Observability (Week 4-5)¶

Phase 2: Migrate to AgentCore Gateway (Week 6-9)¶

Phase 3: Adopt AgentCore Memory (Week 9-10)¶

Phase 4: Migrate to AgentCore Runtime (Week 11-16)¶

Phase 5: Adopt AgentCore Identity (Week 17-18)¶

Cost Estimation¶

Current Infrastructure Costs (Estimated)¶

Projected AgentCore Costs (at 10,000 executions/month)¶

Savings Analysis¶

Sources¶