Complete implementation guide for all AWS AI, automation, and intelligent operations services
The Trinity Beast Infrastructure (TBI) employs a multi-layered pro-active AI and automation system that monitors, defends, heals, and reports on the infrastructure autonomously. Rather than waiting for problems to escalate, the system detects anomalies in real-time, correlates threat signals using AI, and takes corrective action — all before a human needs to intervene.
Core Principle: Self-heal first, notify second. If the system can fix it, fix it and tell Cory after. Escalate fast on unknowns. Never suppress critical alerts. Bedrock is advisory — AI suggests, Step Functions decide, Lambda acts.
| Service | Role in TBI | Layer |
|---|---|---|
| Amazon EventBridge | Event routing — connects CloudWatch alarms, scheduled triggers, and custom events to automated workflows | Automation |
| AWS Step Functions | Workflow orchestration — multi-step heal/verify/notify sequences with retry logic and branching | Orchestration |
| Amazon Bedrock (Claude Sonnet 4.6) | AI threat analysis — pattern correlation, severity assessment, plain-English reports, auto-action recommendations | Intelligence |
| AWS Lambda (7 Go functions) | Action execution — self-heal, WAF management, notifications, honeypot processing, AI analysis, support automation, operational digests | Execution |
| AWS WAF | Automated IP blocking — honeypot-triggered and AI-recommended blocks applied to the API WAF | Defense |
| Amazon SNS | Operational notifications — severity-tagged alerts delivered to email with full context | Communication |
| Amazon Translate | Multi-lingual content — 384 translated documents, real-time support correspondence translation | Localization |
| Amazon CloudWatch | Metrics, alarms, anomaly detection — feeds events into EventBridge for automated response | Observability |
| Amazon GuardDuty | Threat detection — VPC flow logs, CloudTrail, DNS analysis for automated security response | Security |
| Amazon SES | Transactional email — receipts, notifications, and AI-drafted correspondence in 12 languages | Communication |
| Amazon SQS | Decoupled processing — usage log pipeline, honeypot auto-block queue | Messaging |
| ElastiCache (Valkey) | Operational state — threat reports, action logs, self-heal counters, honeypot data, search indexes | State |
The complete AutoOps pipeline showing all 7 Lambda functions, AI services, EventBridge rules, and data flows — with per-Lambda binary sizes and roles. See the legend below the diagram for common Lambda configuration shared across all functions.
flowchart TB
subgraph Sources["Event Sources"]
CWA["☁️ CloudWatch Alarms\n5xx spike · Health fail\nACU ceiling · Anomaly"]
GD["🛡️ GuardDuty\nVPC flow · CloudTrail\nDNS · Severity ≥ 7"]
HP["🪤 Honeypot Traps\n12 decoy endpoints\n2-hit auto-block threshold"]
SCHED["⏱️ Scheduled Events\nEvery 5 min · Daily 6AM\nMonday 7AM EST"]
end
subgraph EB["EventBridge (6 Rules)"]
R1["tbi-ops-alarm-trigger\nCW alarm → Step Function"]
R2["tbi-ops-honeypot-queue\nEvery 5 min → honeypot-processor"]
R3["tbi-ops-bedrock-analyze-schedule\nEvery 5 min → bedrock-analyze"]
R4["tbi-ops-guardduty-high-finding\nGuardDuty severity ≥ 7 → bedrock-analyze"]
R5["tbi-ops-daily-digest\ncron 0 11 * * ? * → digest"]
R6["tbi-ops-weekly-digest\ncron 0 12 ? * MON * → digest"]
end
subgraph SF["Step Functions"]
SFN["tbi-ops-health-check-heal\nCheck → Wait 60s → Recheck\n→ Force-Deploy → Verify → Notify"]
end
subgraph LambdaLayer["Lambda Functions — Go · provided.al2023 · 1 vCPU · 1770 MB · 60s timeout · Not in VPC"]
L1["tbi-ops-self-heal\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 ECS force-deploy\nrestart-task · check-health\nLogs to Valkey"]
L2["tbi-ops-waf-action\n━━━━━━━━━━━━━━━━━━\n📦 8.2 MB\n🔧 block-ips · unblock-ips\nlist-blocked\nManages WAF IP set"]
L3["tbi-ops-notify\n━━━━━━━━━━━━━━━━━━\n📦 8.3 MB\n🔧 Branded HTML email\nSES delivery · SNS fallback\n[INFO/WARN/CRIT/HEALED]"]
L4["tbi-ops-honeypot-processor\n━━━━━━━━━━━━━━━━━━\n📦 8.2 MB\n🔧 Drain autoblock queue\nDeduplicate IPs\nInvoke waf-action + notify"]
L5["tbi-ops-bedrock-analyze\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 Gather threat signals\nQuiet check → Bedrock AI\nAuto-act · Store · Notify"]
L6["tbi-raima-support\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 Raima Support Assistant\nAI draft response · Knowledge gaps\nPre-fetch diagnostics · Notify"]
L7["tbi-ops-digest\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB · ⏱️ 180s timeout\n🔧 Daily 300-word summary\nWeekly newsletter pipeline\nBedrock narrative · Notify"]
end
subgraph AI["Amazon Bedrock"]
Claude["Claude Sonnet 4.6\nus.anthropic.claude-sonnet-4-6\nInference Profile · No setup\nPattern correlation\nThreat assessment\nSupport drafting\nOperational narratives"]
end
subgraph WAFDef["WAF Defense"]
IPSet["tbi-autoops-blocked-ips\nWAF IP Set\nID: 8d55de25-8ba5-4982-8c41-f4316c9bd50d"]
WRule["AutoOps-BlockedIPs\nPriority 7\ntrinity-beast-api-waf"]
end
subgraph NotifyLayer["Notifications"]
SNS["tbi-ops-notifications\nSNS Topic\nARN: arn:aws:sns:us-east-2:211998422884:..."]
SES["Amazon SES\nBranded HTML email\nNo-Reply@CPMP-Site.org → Cory"]
end
subgraph ValkeyState["Valkey State (ElastiCache)"]
VK["autoops:threats:daily\nautoops:actions:log\nautoops:self-heal:count\nautoops:digest:daily · weekly\nsupport:ticket:{id}\nhoneypot:autoblock_queue\nreport:text:YYYY-MM-DD"]
end
subgraph Anomaly["Layer 3 — Anomaly Detection"]
AD["4 CloudWatch ML Models\nRequestCount · TargetResponseTime\nHTTPCode_5XX · CacheHitRate\nBand width: 2σ · 3 eval periods"]
end
%% Event sources → EventBridge
CWA -->|"alarm state change"| R1
CWA -->|"alarm state change"| R3
GD -->|"severity ≥ 7"| R4
HP -->|"hit logged to queue"| R2
SCHED --> R2
SCHED --> R3
SCHED --> R5
SCHED --> R6
AD -->|"anomaly alarm"| R1
%% EventBridge → targets
R1 --> SFN
R2 --> L4
R3 --> L5
R4 --> L5
R5 --> L7
R6 --> L7
%% Step Function → Lambdas
SFN -->|"check-health / force-deploy"| L1
SFN -->|"notify result"| L3
%% Lambda → Lambda
L4 -->|"invoke sync"| L2
L4 -->|"invoke async"| L3
L5 -->|"invoke async"| L3
L5 -->|"auto-block sync"| L2
L6 -->|"invoke async"| L3
L7 -->|"invoke async"| L3
%% Lambda → AI
L5 -->|"InvokeModel"| Claude
L6 -->|"InvokeModel"| Claude
L7 -->|"InvokeModel"| Claude
%% Lambda → WAF
L2 --> IPSet
IPSet --> WRule
%% Lambda → Notifications
L3 --> SNS
SNS --> SES
%% Lambda → Valkey
L1 -->|"log action"| VK
L4 -->|"drain queue"| VK
L5 -->|"store threat report"| VK
L6 -->|"store ticket analysis"| VK
L7 -->|"store digest"| VK
%% Styling
style Sources fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style EB fill:#1e293b,stroke:#FF9900,color:#e2e8f0
style SF fill:#1e293b,stroke:#60a5fa,color:#e2e8f0
style LambdaLayer fill:#1e293b,stroke:#10b981,color:#e2e8f0
style AI fill:#1e293b,stroke:#a78bfa,color:#e2e8f0
style WAFDef fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style NotifyLayer fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style ValkeyState fill:#1e293b,stroke:#06b6d4,color:#e2e8f0
style Anomaly fill:#1e293b,stroke:#818cf8,color:#e2e8f0
provided.al2023tbi-ops-digest: 180s)tbi-autonomous-ops-role (shared)GOOS=linux GOARCH=amd64bash scripts/kcc.sh deploy-lambda autoops <name>Amazon EventBridge is the central nervous system of AutoOps. It receives events from across the infrastructure and routes them to the appropriate automated response.
Pattern: Event Source → EventBridge Rule (pattern match) → Target (Step Function or Lambda)
Every CloudWatch alarm state change, every GuardDuty finding, and every scheduled interval generates an event. EventBridge rules match these events by pattern and invoke the correct automated response — no polling, no delays.
graph LR
A["CloudWatch Alarm
State: ALARM"] -->|"Event Pattern"| EB["EventBridge
Event Bus"]
B["Scheduled
rate(5 minutes)"] -->|"Schedule"| EB
EB -->|"tbi-ops-alarm-trigger"| SF["Step Function
health-check-heal"]
EB -->|"tbi-ops-honeypot-queue"| L["Lambda
honeypot-processor"]
style EB fill:#1e293b,stroke:#FF9900,color:#e2e8f0
style SF fill:#1e293b,stroke:#60a5fa,color:#e2e8f0
style L fill:#1e293b,stroke:#10b981,color:#e2e8f0
| Rule Name | Event Pattern | Target | Purpose |
|---|---|---|---|
tbi-ops-alarm-trigger | CloudWatch alarm state change → ALARM | Step Function: tbi-ops-health-check-heal | Triggers automated health check and self-healing workflow when any alarm fires |
tbi-ops-honeypot-queue-processor | Scheduled: every 5 minutes | Lambda: tbi-ops-honeypot-processor | Drains the honeypot auto-block queue and applies WAF rules for repeat offenders |
tbi-ops-bedrock-analyze-schedule | Scheduled: every 5 minutes | Lambda: tbi-ops-bedrock-analyze | Periodic AI threat analysis — skips Bedrock when infrastructure is quiet (cost-conscious) |
tbi-ops-guardduty-high-finding | GuardDuty Finding, severity ≥ 7 | Lambda: tbi-ops-bedrock-analyze | Immediate AI threat analysis on HIGH/CRITICAL GuardDuty findings |
trinity-beast-nightly-sync | Scheduled: cron(0 6 * * ? *) (1 AM EST) | ECS Task: trinity-beast-sync-job | Nightly Aurora → Valkey sync + search index rebuild |
AWS Step Functions orchestrate multi-step automated responses that require sequencing, branching, and retry logic. A single Lambda can restart a task — but a Step Function can check health, wait, recheck, decide whether to restart or escalate, verify recovery, and notify.
Triggered by CloudWatch alarm via EventBridge. Orchestrates the full self-healing cycle.
| Step | Action | On Success | On Failure |
|---|---|---|---|
| 1. Check Health | Invoke tbi-ops-self-heal with action: check-health | If all healthy → notify INFO and exit | Continue to step 2 |
| 2. Wait 60s | Built-in Wait state — allow transient issues to resolve | Continue to step 3 | — |
| 3. Recheck Health | Invoke tbi-ops-self-heal again | If recovered → notify SELF-HEALED and exit | Continue to step 4 |
| 4. Force Deploy | Invoke tbi-ops-self-heal with action: force-deploy | Continue to step 5 | Notify CRITICAL + escalate |
| 5. Verify Recovery | Wait 90s, then recheck health | Notify SELF-HEALED | Notify CRITICAL — requires attention |
Design Decision: The 60-second wait before action prevents false positives from transient network blips or rolling deployments. Most alarms self-resolve within this window. Only persistent failures trigger the force-deploy.
Amazon Bedrock provides the intelligence layer — correlating multiple weak signals into a coherent threat picture that simple threshold-based alarms would miss.
us.anthropic.claude-sonnet-4-6
graph TB
subgraph Gather["Signal Gathering"]
S1["WAF blocks (24h)"]
S2["Honeypot hits"]
S3["Alarm state"]
S4["GuardDuty findings"]
S5["Self-heal count"]
S6["Recent IPs"]
end
subgraph Decide["Quiet Check"]
QC{"All signals
below threshold?"}
end
subgraph Analyze["Bedrock Analysis"]
BR["Claude Sonnet 4.6
System: Security Analyst
Input: Threat signals JSON
Output: Structured report"]
end
subgraph Act["Response Actions"]
A1["Store report in Valkey"]
A2["Notify if HIGH/CRITICAL"]
A3["Execute safe auto-actions"]
end
S1 --> QC
S2 --> QC
S3 --> QC
S4 --> QC
S5 --> QC
S6 --> QC
QC -->|"Yes"| Skip["Log quiet · Skip Bedrock"]
QC -->|"No"| BR
BR --> A1
BR --> A2
BR --> A3
style Gather fill:#1e293b,stroke:#ef4444,color:#e2e8f0
style Decide fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
style Analyze fill:#1e293b,stroke:#a78bfa,color:#e2e8f0
style Act fill:#1e293b,stroke:#10b981,color:#e2e8f0
| Signal | Source | What It Tells Us |
|---|---|---|
| WAF blocks (24h) | CloudWatch / /public/infrastructure | Volume of blocked malicious requests — baseline vs. spike |
| Honeypot hits | Valkey honeypot:log | Active scanning/enumeration attempts against decoy endpoints |
| Auto-blocked IPs | Valkey honeypot:blocked_ips | How many IPs have been automatically banned |
| Pending queue | Valkey honeypot:autoblock_queue | IPs awaiting WAF block — indicates active attack if growing |
| Alarms firing | CloudWatch | Infrastructure health issues (5xx, health check failures) |
| GuardDuty findings | GuardDuty | Network-level threats (port scanning, unusual API calls) |
| Self-heal count | Valkey autoops:self-heal:count | Frequency of automated recoveries — high count may indicate underlying issue |
| Recent honeypot IPs | Valkey honeypot:log (top 10) | Specific attacker IPs for pattern analysis |
Bedrock returns structured JSON:
Bedrock is only invoked when signals exceed quiet thresholds (WAF blocks > 50, honeypot hits > 10, any alarm firing, any GuardDuty finding). During quiet periods, the Lambda logs quiet status and exits without calling Bedrock — keeping costs near zero.
Seven Go Lambda functions form the action layer of AutoOps. Each is a standalone binary, cross-compiled for Linux, running on the provided.al2023 runtime at 1770 MB memory (multiple of 3 — fast execution, no cold-start pain).
All functions share the same base configuration: Go · provided.al2023 · 1 vCPU · 1770 MB · Not in VPC · IAM role: tbi-autonomous-ops-role.
| Function | Binary Size | Timeout | Layer | Actions |
|---|---|---|---|---|
tbi-ops-notify | 8.3 MB | 60s | 1 | Branded HTML email via SES · SNS fallback · Severity badges [INFO/WARNING/CRITICAL/SELF-HEALED] |
tbi-ops-self-heal | 8.5 MB | 60s | 1 | force-deploy, restart-task, check-health · Logs actions to Valkey |
tbi-ops-waf-action | 8.2 MB | 60s | 1 | block-ips, unblock-ips, list-blocked · Manages tbi-autoops-blocked-ips WAF IP set |
tbi-ops-honeypot-processor | 8.2 MB | 60s | 1 | Drain honeypot:autoblock_queue → deduplicate IPs → invoke waf-action (sync) → notify (async) |
tbi-ops-bedrock-analyze | 8.5 MB | 60s | 2 | Gather 8 threat signals → quiet check → Claude Sonnet 4.6 analysis → store in Valkey → notify if HIGH/CRITICAL → execute safe auto-actions |
tbi-raima-support | 8.5 MB | 60s | 4 | Auto-categorize ticket (8 categories) → Claude Sonnet 4.6 draft response in preferred_lang → pre-fetch diagnostics → notify Cory with full context |
tbi-ops-digest | 8.5 MB | 180s | 5 | Daily 300-word summary + weekly newsletter pipeline · Reads report:text:YYYY-MM-DD from Valkey · Claude Sonnet 4.6 narrative · Auto-translate 11 languages · Send via notify |
AutoOps Lambdas invoke each other directly via the AWS Lambda Invoke API:
honeypot-processor → invokes waf-action (synchronous) and notify (async)bedrock-analyze → invokes notify (async) and waf-action (synchronous for auto-blocks)bedrock-support → invokes notify (async) with ticket analysis and draft responsedigest → invokes notify (async) with daily/weekly operational summaryself-heal and notify as workflow stepsAll 7 functions share tbi-autonomous-ops-role with permissions for: ECS (update-service, stop-task), WAF (update-ip-set), CloudWatch (get-metrics, describe-alarms), Bedrock (invoke-model), SES (send-email), SNS (publish), SQS (receive/delete), Secrets Manager (get-secret), and Lambda (invoke-function on tbi-ops-*).
12 decoy endpoints that no legitimate user would ever access. Any hit reveals a scanner, bot, or attacker probing for vulnerabilities.
/wp-admin/.env/phpmyadmin/.git/config/wp-login.php/administrator/xmlrpc.php/backup.sql/config.php/debug/actuator/server-status| Step | Action | Storage |
|---|---|---|
| 1. Hit detected | Log IP, user-agent, path, timestamp | Valkey: honeypot:ip:{ip} hash + honeypot:log sorted set |
| 2. Tarpit | 2-second delay response (wastes scanner time) | — |
| 3. Threshold check | If same IP hits 2+ traps → queue for auto-block | Valkey: honeypot:autoblock_queue |
| 4. Queue drain (5 min) | EventBridge → honeypot-processor Lambda | — |
| 5. WAF block | IPs added to tbi-autoops-blocked-ips WAF IP set | Valkey: honeypot:blocked_ips set |
| 6. Notify | SELF-HEALED notification with blocked IP list | — |
Table-driven design: Honeypot paths are stored in a Valkey set (honeypot:paths). Add or remove trap endpoints without redeploying — just update the set.
The WAF layer provides both static protection (managed rule groups) and dynamic, AI-driven blocking via the AutoOps IP set.
| WAF | Protects | Rules |
|---|---|---|
trinity-beast-api-waf | ALB (API) | IP Reputation, Common Rules, Known Bad Inputs, SQL Injection, Rate Limit Global (2000/5min), Rate Limit Admin (100/5min), AutoOps-BlockedIPs (priority 7) |
CreatedByCloudFront-449feaa5 | CloudFront (Website) | Anti-DDoS, IP Reputation, Common Rules, Known Bad Inputs |
ID: 8d55de25-8ba5-4982-8c41-f4316c9bd50d
Managed exclusively by AutoOps Lambdas. IPs are added by:
tbi-ops-honeypot-processor — repeat honeypot offenders (2+ hits)tbi-ops-bedrock-analyze — AI-recommended blocks (safe auto-actions)tbi-ops-waf-action — manual blocks via KCC commandsAll operational notifications flow through a branded email pipeline. No raw text, no ugly AWS formatting — every alert arrives as a professionally designed HTML email via Amazon SES.
The tbi-ops-notify Lambda serves as the single gateway for all notifications. It accepts events from two paths:
bedrock-analyze, self-heal, digest, honeypot-processor) invoke it directly with structured event payloads
graph LR
subgraph Sources["Event Sources"]
CW["CloudWatch Alarms"]
BA["tbi-ops-bedrock-analyze"]
SH["tbi-ops-self-heal"]
DG["tbi-ops-digest"]
HP["tbi-ops-honeypot-processor"]
SP["tbi-raima-support"]
end
subgraph Router["SNS Topic"]
SNS["tbi-ops-notifications"]
end
subgraph Notify["Notification Lambda"]
NL["tbi-ops-notify"]
Parse["Parse Event Type"]
Template["Build HTML Email"]
end
subgraph Delivery["Email Delivery"]
SES["Amazon SES"]
Email["Branded HTML Email
Dark theme · Structured sections
Severity badge · Action cards"]
end
subgraph Fallback["Fallback Path"]
SNSFB["SNS Plain Text
(only if SES fails)"]
end
CW -->|"Alarm state change"| SNS
SNS -->|"Lambda subscription"| NL
BA -->|"Direct invoke"| NL
SH -->|"Direct invoke"| NL
DG -->|"Direct invoke"| NL
HP -->|"Direct invoke"| NL
SP -->|"Direct invoke"| NL
NL --> Parse
Parse --> Template
Template --> SES
SES --> Email
Template -.->|"SES failure"| SNSFB
style Sources fill:#1e293b,stroke:#60a5fa,color:#e2e8f0
style Router fill:#1e293b,stroke:#FF9900,color:#e2e8f0
style Notify fill:#1e293b,stroke:#10b981,color:#e2e8f0
style Delivery fill:#1e293b,stroke:#a855f7,color:#e2e8f0
style Fallback fill:#1e293b,stroke:#64748b,color:#94a3b8
linkStyle 0 stroke:#FF9900
linkStyle 1 stroke:#FF9900
linkStyle 2 stroke:#10b981
linkStyle 3 stroke:#10b981
linkStyle 4 stroke:#10b981
linkStyle 5 stroke:#10b981
linkStyle 6 stroke:#10b981
linkStyle 7 stroke:#60a5fa
linkStyle 8 stroke:#60a5fa
linkStyle 9 stroke:#a855f7
linkStyle 10 stroke:#a855f7
linkStyle 11 stroke:#64748b
tbi-ops-notificationsarn:aws:sns:us-east-2:211998422884:tbi-ops-notificationstbi-ops-notify LambdaCPMP Mission <No-Reply@CPMP-Site.org>CoryDeanKalani@Gmail.combgcolor HTML attributes on every <td> (Gmail strips CSS background properties)rgba() values (Gmail strips them entirely)<ul> listsThe Lambda auto-detects the incoming event format and normalizes it:
| Source Format | Detection | Parsing |
|---|---|---|
| CloudWatch Alarm (via SNS) | Records[].Sns.Message contains AlarmName | Extracts alarm name, state transition, metric, reason → structured sections |
| AutoOps Lambda (direct) | severity + title fields present | Used directly — already in NotifyEvent format |
| Generic SNS message | Has Records array but no alarm fields | Subject becomes title, message body becomes detail |
| Level | Subject Prefix | When Used | Action Required |
|---|---|---|---|
| INFO | [INFO] | Health check passed, digest generated, routine status | None — informational only |
| WARNING | [WARNING] | HIGH threat detected, anomaly alarm, elevated activity | Review when convenient |
| CRITICAL | [CRITICAL] | Self-heal failed, IAM changes detected, EC2 launch detected | Immediate attention required |
| SELF-HEALED | [SELF-HEALED] | Problem detected AND fixed automatically, alarm returned to OK | None — system handled it |
A custom Bedrock-powered translation engine replaces AWS Translate for all document translation. The engine uses sentinel preprocessing, multi-layer validation, and Step Function orchestration to produce structurally correct translations that preserve code blocks, diagrams, and brand terminology.
| Component | Type | Purpose |
|---|---|---|
POST /admin/translate | Admin API | Job submission, monitoring, control (9 endpoints) |
trinity-beast-translation-queue | SQS | Decouple submission from execution |
tbi-translate-pipe | EventBridge Pipe | SQS → Step Function (no glue Lambda) |
tbi-translation-orchestrator | Step Functions | Fan-out: docs serial, languages parallel ×11 |
tbi-translate-worker | ECS Fargate Task | Bedrock translation + sentinel preprocessing + validation |
tbi-translate-deploy | Lambda (Go) | CloudFront invalidation per document |
tbi-translate-finalize | Lambda (Go) | Search index rebuild + notification |
ecs:runTask.sync.Principle: Translate the understanding, preserve the execution. Code blocks, Mermaid diagrams, and technical identifiers are protected by the sentinel system. A developer in any language reads the explanation in their native tongue, then copy-pastes the code and has it work on the first try.
Full documentation: See AutoOps-Translation-Engine.html" style="color:#60a5fa;">AutoOps Translation Engine for the complete technical reference — sentinel types, validator system, Step Function orchestration, admin API, cost protection, and operations guide.
The monitoring layer feeds events into AutoOps and provides the data that Bedrock analyzes.
| Alarm | Metric | Threshold | AutoOps Response |
|---|---|---|---|
| WAF High Block Rate | WAF BlockedRequests | > 1000/5min | Triggers Bedrock analysis |
| API 5xx Spike | ALB 5xx count | > 10/min | Health check → self-heal workflow |
| API 4xx Spike | ALB 4xx count | > 100/min | Bedrock analysis for abuse patterns |
| GuardDuty Finding | GuardDuty event | Any HIGH/CRITICAL | Immediate notification + Bedrock analysis |
| ECS CPU High | ECS CPUUtilization | > 80% | Alert (scaling is manual) |
| Service Count Low | ECS RunningTaskCount | < 1 | Health check → force-deploy |
| Aurora CPU High | RDS CPUUtilization | > 80% | Alert + ACU review |
Detector: 18ceef6f8dddcf6082473cc7016ee458
Sources: VPC flow logs, CloudTrail API calls, DNS query logs
Integration: Findings generate EventBridge events → AutoOps response
All monitoring data is exposed via GET /public/infrastructure — a single unauthenticated endpoint that powers the Infrastructure Live page. Cached 60s in Valkey. Includes CloudFront, WAF, SQS, Lambda, Sync, GuardDuty, Alarms, Honeypot, and AutoOps stats.
The entire AI and automation layer runs for approximately $10–15/month — a remarkably low cost for a fully autonomous, AI-powered operations system running 24/7.
| Service | Usage Pattern | Monthly Cost |
|---|---|---|
| EventBridge | ~8,640 events/month (alarm changes + scheduled) | ~$0.50 |
| Step Functions | ~100 executions/month (only on alarm) | ~$1.00 |
| Lambda (7 AutoOps) | ~10,000 invocations/month, 1770 MB, <5s avg | ~$2–5 |
| Amazon Bedrock | ~50–200 invocations/month (only when signals active) | ~$2–5 |
| SNS | ~100 notifications/month | ~$0.10 |
| Amazon Bedrock (Translation) | ~$1.65 per doc-language pair, usage-dependent | ~$2–10 |
| Total AI & Automation | ~$10–15/mo |
Cost-conscious by design: Bedrock is only called when threat signals exceed quiet thresholds. During normal operations (most of the time), the Lambda checks signals, finds everything quiet, and exits without invoking the AI model. This keeps Bedrock costs near zero during peaceful periods.
All 5 layers of the AutoOps system are fully deployed and operational.
| Layer | Name | Status | Purpose |
|---|---|---|---|
| 1 | Autonomous Operations | LIVE | Event-driven self-healing, WAF automation, notifications |
| 2 | Intelligent Threat Response | LIVE | AI-powered threat analysis via Bedrock (every 5 min + GuardDuty HIGH) |
| 3 | Predictive Operations | LIVE | CloudWatch Anomaly Detection — 4 ML models learning traffic patterns |
| 4 | Customer Engagement | LIVE | AI-assisted support — auto-categorize, auto-reply as AutoOps, human escalation option, multi-lingual |
| 5 | Self-Documenting Infrastructure | LIVE | Bedrock-generated daily + weekly operational digests |
TrinityBeast-Anomaly-RequestRate, TrinityBeast-Anomaly-Latency, TrinityBeast-Anomaly-ErrorRate, TrinityBeast-Anomaly-CacheHitRatetbi-ops-alarm-trigger → Step Function health-check-healtbi-raima-support)preferred_lang via Bedrocksupport:ticket:{id})tbi-ops-digest)tbi-ops-daily-digest + tbi-ops-weekly-digest/public/infrastructure endpointautoops:digest:daily, autoops:digest:weekly)tbi-ops-notify (email digest to Cory)The weekly newsletter uses daily operations reports as its primary source of truth — not just a point-in-time snapshot:
~/daily-reports/ and uploads to s3://trinity-beast-website-east2/daily-reports/report:text:YYYY-MM-DD (30-day TTL). Backfills all dates from manifest that don't already exist.report:text:* from Valkey (up to 12 KB per day)Key design decision: The newsletter reads REAL operational data from daily reports — actual WAF block counts, actual session accomplishments, actual security events. Bedrock rewrites this for subscribers but cannot invent or speculate. If a metric is unavailable, it is omitted rather than explained away.