VPC Endpoint Cleanup — May 2026¶

Why¶

April 2026 invoice audit found VpcEndpoint-Hours running at $496.80/mo — 49,680 endpoint-hours from 23 Interface Endpoints across prod-vpc, dev-vpc, demo-vpc, and network-hardening. Combined with 5 NAT Gateways at $162/mo, the VPC was paying for two parallel egress paths to AWS APIs.

This cleanup removes only endpoints with verified zero traffic in the preceding 30 days. The original audit recommended a wider 9-endpoint cut on the assumption that Secrets Manager / KMS / CloudWatch Logs payloads were KB-scale. CloudWatch AWS/PrivateLinkEndpoints metrics over 2026-04-04 → 2026-05-04 disproved that assumption — see "What we kept and why" below.

Scope¶

Stack	Removed	Kept	Approx save
`network-hardening`	`EpSsmContacts`, `EpIncidentManager`	`EpEcrApi`, `EpEcrDkr`, `EpEc2Api`	~$44/mo
`dev-vpc` (via `env-vpc.yml`)	`KmsEndpoint`	SSM trio, SecretsEndpoint, LogsEndpoint, Gateway endpoints	~$22/mo
`demo-vpc` (via `env-vpc.yml`)	`KmsEndpoint`	same as dev	~$22/mo
`prod-vpc`	(unchanged)	all 6	$0
Total	4 endpoints removed across 3 resource blocks		~$88/mo

Why these specific endpoints (verified zero usage)¶

30-day CloudWatch metrics (BytesProcessed + NewConnections, summed across all subnet ENIs per endpoint, window 2026-04-04 → 2026-05-04):

Endpoint	Bytes	Connections	Action
`prod-ssm-contacts` (`vpce-0bb935982984ce5ee`)	0	0	Remove
`prod-ssm-incidents` (`vpce-025391667e496086a`)	0	0	Remove
`dev-kms` (`vpce-0604244fbd2c77bd9`)	0	0	Remove
`demo-kms` (`vpce-02d28a469f6867a8e`)	0	0	Remove

Incident Manager features (contacts, incidents) were never configured. KMS endpoint receives no direct SDK traffic — Aurora/S3/Secrets Manager server-side encryption stays AWS-internal and does not transit the endpoint.

What we kept and why¶

Endpoint	Bytes (30d)	Connections (30d)	Reason kept
`prod-ec2` (`vpce-0292c6c6ceb26fdb3`)	3.8 MB	300	Light but non-zero. Likely SSM agent metadata, CodeDeploy `describe-instances` hooks, or Image Builder pipeline. Trace caller via CloudTrail before considering removal.
`dev-secretsmanager`	512 MB	52K	Heavy use — `refresh-secrets.sh` cron at `*/5` × 3 secrets. NAT path works functionally but adds latency on hot path.
`demo-secretsmanager`	495 MB	49K	Same pattern as dev.
`dev-logs`	7.0 GB	481K	CloudWatch Logs agent + Laravel `LOG_CHANNEL=cloudwatch` ship through this.
`demo-logs`	3.6 GB	231K	Same pattern as dev.

Future consideration: removing the 5 "kept" endpoints¶

The dollar math still favors removal — at $0.045/GB NAT processing, the ~12 GB/mo of dev+demo log+secrets traffic costs ~$0.55/mo via NAT, vs ~$108/mo for the 5 endpoints. But three operational concerns warrant deferring:

Latency — Interface endpoints stay on AWS backbone (~1-2ms). NAT routes via Elastic IP through public internet (~5-15ms added). Not user-visible per request, but accumulates across hundreds of thousands of cron + log-shipping calls.
Posture — The hardening stack is named that for a reason. Removing endpoints means Laravel log shipping + Secrets Manager calls flow over the public internet (TLS still, but public). Reverses the original reason these were added.
NAT SPOF interaction — Workstream #7 in the cost audit wants OnePerAZ → SingleNat for $65/mo. Combined with this, all dev/demo egress would route through one NAT in one AZ. If pursuing #7, keep these endpoints to retain a private path for AWS API calls.

Decision: revisit if/when workstream #7 is resolved.

Deploy sequence¶

Step 0 — pre-flight verification¶

Pre-drift baseline is clean as of 2026-05-04. Commit 47a047644 (refresh-secrets cron + chmod fix) is in main and was deployed via prod-pipeline (parent b00aca35d6). The functional fix ships through CodeDeploy after-install.sh, so currently-running instances have the fix without an instance refresh.

Step 1 — preview network-hardening with a change-set¶

CS_NAME="endpoint-cleanup-$(date +%s)"

aws cloudformation create-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name network-hardening \
  --change-set-name "$CS_NAME" \
  --change-set-type UPDATE \
  --template-body file://cloudformation/hardening/network-hardening.yml \
  --parameters ParameterKey=EnvironmentName,UsePreviousValue=true \
               ParameterKey=VpcId,UsePreviousValue=true \
               ParameterKey=EndpointSecurityGroupId,UsePreviousValue=true \
               ParameterKey=FlowLogRetentionDays,UsePreviousValue=true \
               ParameterKey=FlowLogsRoleName,UsePreviousValue=true \
               ParameterKey=FlowLogsLogGroupName,UsePreviousValue=true \
  --capabilities CAPABILITY_NAMED_IAM

aws cloudformation describe-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name network-hardening --change-set-name "$CS_NAME" \
  --query 'Changes[].ResourceChange.[Action,LogicalResourceId,ResourceType]' \
  --output table

Expected output: exactly 2 Remove rows (EpSsmContacts, EpIncidentManager). Anything else → stop, investigate.

Step 2 — execute network-hardening¶

aws cloudformation execute-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name network-hardening --change-set-name "$CS_NAME"

aws cloudformation wait stack-update-complete \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name network-hardening

Step 3 — dev-vpc¶

CS_NAME="endpoint-cleanup-$(date +%s)"

aws cloudformation create-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name dev-vpc \
  --change-set-name "$CS_NAME" \
  --change-set-type UPDATE \
  --template-body file://cloudformation/stacks/env/env-vpc.yml \
  --parameters ParameterKey=EnvironmentName,UsePreviousValue=true \
               ParameterKey=VpcCidr,UsePreviousValue=true \
               ParameterKey=NatStrategy,UsePreviousValue=true \
  --capabilities CAPABILITY_IAM

aws cloudformation describe-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name dev-vpc --change-set-name "$CS_NAME" \
  --query 'Changes[].ResourceChange.[Action,LogicalResourceId,ResourceType]' \
  --output table

Expected output: exactly 1 Remove row (KmsEndpoint).

aws cloudformation execute-change-set \
  --profile vell-prod-admin --region us-east-1 \
  --stack-name dev-vpc --change-set-name "$CS_NAME"

Step 4 — soak verification on dev (24-48h)¶

/var/log/vell-boot.log shows secrets refresh continuing (we kept SecretsEndpoint)
CloudWatch log groups (/dev/laravel, etc.) continue receiving events (we kept LogsEndpoint)
Session Manager still works on dev-web instances (we kept SSM trio)
App health checks pass; ALB target group reports healthy

Step 5 — demo-vpc¶

Repeat Step 3 with --stack-name demo-vpc.

Rollback¶

If verification fails on dev, roll back the template change and re-deploy:

git revert <commit-sha>
aws cloudformation deploy --profile vell-prod-admin --region us-east-1 \
  --stack-name dev-vpc \
  --template-file cloudformation/stacks/env/env-vpc.yml \
  --capabilities CAPABILITY_IAM --no-fail-on-empty-changeset

CloudFormation will recreate the Interface Endpoints (new IDs, but same DNS names because PrivateDnsEnabled: true registers AWS service DNS).

Same procedure for network-hardening if needed.

Out of scope¶

prod-vpc interface endpoints — kept as-is for operational hygiene
prod-ec2 endpoint (light usage) — trace caller before considering removal
dev/demo Secrets Manager + CloudWatch Logs endpoints — actively used; defer pending NAT consolidation decision (workstream #7)
NAT Gateway consolidation (5 → 3) — separate workstream
Bedrock Knowledge Base migration off OpenSearch Serverless — separate
Aurora Serverless v2 ACU tuning — separate
AWS Support tier downgrade (console-only, no CFN)

Cross-reference¶

Audit findings: AWS April 2026 invoice review (cost-reduction-audit, 2026-05-01). Verification methodology: AWS/PrivateLinkEndpoints BytesProcessed + NewConnections summed across subnet ENIs over a 30-day window.