The Trinity Beast – AI AutoOps Implementation

Complete implementation guide for all AWS AI, automation, and intelligent operations services

Region: us-east-2 (Ohio) Version: v2.2 May 17, 2026

1. Overview

The Trinity Beast Infrastructure (TBI) employs a multi-layered pro-active AI and automation system that monitors, defends, heals, and reports on the infrastructure autonomously. Rather than waiting for problems to escalate, the system detects anomalies in real-time, correlates threat signals using AI, and takes corrective action — all before a human needs to intervene.

AI Model
Claude Sonnet 4.6 (Bedrock)
AutoOps Lambdas
7 Functions
Event Rules
6 EventBridge
Workflows
1 Step Function
Anomaly Detectors
4 ML Models
Honeypot Traps
12 Endpoints
Translation
12 Languages
Monthly Cost
~$10–15/mo
Design Philosophy
Self-heal first
All 5 Layers
LIVE ✅

Core Principle: Self-heal first, notify second. If the system can fix it, fix it and tell Cory after. Escalate fast on unknowns. Never suppress critical alerts. Bedrock is advisory — AI suggests, Step Functions decide, Lambda acts.

AWS Services Utilized

ServiceRole in TBILayer
Amazon EventBridgeEvent routing — connects CloudWatch alarms, scheduled triggers, and custom events to automated workflowsAutomation
AWS Step FunctionsWorkflow orchestration — multi-step heal/verify/notify sequences with retry logic and branchingOrchestration
Amazon Bedrock (Claude Sonnet 4.6)AI threat analysis — pattern correlation, severity assessment, plain-English reports, auto-action recommendationsIntelligence
AWS Lambda (7 Go functions)Action execution — self-heal, WAF management, notifications, honeypot processing, AI analysis, support automation, operational digestsExecution
AWS WAFAutomated IP blocking — honeypot-triggered and AI-recommended blocks applied to the API WAFDefense
Amazon SNSOperational notifications — severity-tagged alerts delivered to email with full contextCommunication
Amazon TranslateMulti-lingual content — 384 translated documents, real-time support correspondence translationLocalization
Amazon CloudWatchMetrics, alarms, anomaly detection — feeds events into EventBridge for automated responseObservability
Amazon GuardDutyThreat detection — VPC flow logs, CloudTrail, DNS analysis for automated security responseSecurity
Amazon SESTransactional email — receipts, notifications, and AI-drafted correspondence in 12 languagesCommunication
Amazon SQSDecoupled processing — usage log pipeline, honeypot auto-block queueMessaging
ElastiCache (Valkey)Operational state — threat reports, action logs, self-heal counters, honeypot data, search indexesState

2. Full System Diagram

The complete AutoOps pipeline showing all 7 Lambda functions, AI services, EventBridge rules, and data flows — with per-Lambda binary sizes and roles. See the legend below the diagram for common Lambda configuration shared across all functions.

Diagram 2.1 — AutoOps Lambda Pipeline — All Components & Configuration
flowchart TB
    subgraph Sources["Event Sources"]
        CWA["☁️ CloudWatch Alarms\n5xx spike · Health fail\nACU ceiling · Anomaly"]
        GD["🛡️ GuardDuty\nVPC flow · CloudTrail\nDNS · Severity ≥ 7"]
        HP["🪤 Honeypot Traps\n12 decoy endpoints\n2-hit auto-block threshold"]
        SCHED["⏱️ Scheduled Events\nEvery 5 min · Daily 6AM\nMonday 7AM EST"]
    end

    subgraph EB["EventBridge (6 Rules)"]
        R1["tbi-ops-alarm-trigger\nCW alarm → Step Function"]
        R2["tbi-ops-honeypot-queue\nEvery 5 min → honeypot-processor"]
        R3["tbi-ops-bedrock-analyze-schedule\nEvery 5 min → bedrock-analyze"]
        R4["tbi-ops-guardduty-high-finding\nGuardDuty severity ≥ 7 → bedrock-analyze"]
        R5["tbi-ops-daily-digest\ncron 0 11 * * ? * → digest"]
        R6["tbi-ops-weekly-digest\ncron 0 12 ? * MON * → digest"]
    end

    subgraph SF["Step Functions"]
        SFN["tbi-ops-health-check-heal\nCheck → Wait 60s → Recheck\n→ Force-Deploy → Verify → Notify"]
    end

    subgraph LambdaLayer["Lambda Functions — Go · provided.al2023 · 1 vCPU · 1770 MB · 60s timeout · Not in VPC"]
        L1["tbi-ops-self-heal\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 ECS force-deploy\nrestart-task · check-health\nLogs to Valkey"]
        L2["tbi-ops-waf-action\n━━━━━━━━━━━━━━━━━━\n📦 8.2 MB\n🔧 block-ips · unblock-ips\nlist-blocked\nManages WAF IP set"]
        L3["tbi-ops-notify\n━━━━━━━━━━━━━━━━━━\n📦 8.3 MB\n🔧 Branded HTML email\nSES delivery · SNS fallback\n[INFO/WARN/CRIT/HEALED]"]
        L4["tbi-ops-honeypot-processor\n━━━━━━━━━━━━━━━━━━\n📦 8.2 MB\n🔧 Drain autoblock queue\nDeduplicate IPs\nInvoke waf-action + notify"]
        L5["tbi-ops-bedrock-analyze\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 Gather threat signals\nQuiet check → Bedrock AI\nAuto-act · Store · Notify"]
        L6["tbi-raima-support\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB\n🔧 Raima Support Assistant\nAI draft response · Knowledge gaps\nPre-fetch diagnostics · Notify"]
        L7["tbi-ops-digest\n━━━━━━━━━━━━━━━━━━\n📦 8.5 MB · ⏱️ 180s timeout\n🔧 Daily 300-word summary\nWeekly newsletter pipeline\nBedrock narrative · Notify"]
    end

    subgraph AI["Amazon Bedrock"]
        Claude["Claude Sonnet 4.6\nus.anthropic.claude-sonnet-4-6\nInference Profile · No setup\nPattern correlation\nThreat assessment\nSupport drafting\nOperational narratives"]
    end

    subgraph WAFDef["WAF Defense"]
        IPSet["tbi-autoops-blocked-ips\nWAF IP Set\nID: 8d55de25-8ba5-4982-8c41-f4316c9bd50d"]
        WRule["AutoOps-BlockedIPs\nPriority 7\ntrinity-beast-api-waf"]
    end

    subgraph NotifyLayer["Notifications"]
        SNS["tbi-ops-notifications\nSNS Topic\nARN: arn:aws:sns:us-east-2:211998422884:..."]
        SES["Amazon SES\nBranded HTML email\nNo-Reply@CPMP-Site.org → Cory"]
    end

    subgraph ValkeyState["Valkey State (ElastiCache)"]
        VK["autoops:threats:daily\nautoops:actions:log\nautoops:self-heal:count\nautoops:digest:daily · weekly\nsupport:ticket:{id}\nhoneypot:autoblock_queue\nreport:text:YYYY-MM-DD"]
    end

    subgraph Anomaly["Layer 3 — Anomaly Detection"]
        AD["4 CloudWatch ML Models\nRequestCount · TargetResponseTime\nHTTPCode_5XX · CacheHitRate\nBand width: 2σ · 3 eval periods"]
    end

    %% Event sources → EventBridge
    CWA -->|"alarm state change"| R1
    CWA -->|"alarm state change"| R3
    GD -->|"severity ≥ 7"| R4
    HP -->|"hit logged to queue"| R2
    SCHED --> R2
    SCHED --> R3
    SCHED --> R5
    SCHED --> R6
    AD -->|"anomaly alarm"| R1

    %% EventBridge → targets
    R1 --> SFN
    R2 --> L4
    R3 --> L5
    R4 --> L5
    R5 --> L7
    R6 --> L7

    %% Step Function → Lambdas
    SFN -->|"check-health / force-deploy"| L1
    SFN -->|"notify result"| L3

    %% Lambda → Lambda
    L4 -->|"invoke sync"| L2
    L4 -->|"invoke async"| L3
    L5 -->|"invoke async"| L3
    L5 -->|"auto-block sync"| L2
    L6 -->|"invoke async"| L3
    L7 -->|"invoke async"| L3

    %% Lambda → AI
    L5 -->|"InvokeModel"| Claude
    L6 -->|"InvokeModel"| Claude
    L7 -->|"InvokeModel"| Claude

    %% Lambda → WAF
    L2 --> IPSet
    IPSet --> WRule

    %% Lambda → Notifications
    L3 --> SNS
    SNS --> SES

    %% Lambda → Valkey
    L1 -->|"log action"| VK
    L4 -->|"drain queue"| VK
    L5 -->|"store threat report"| VK
    L6 -->|"store ticket analysis"| VK
    L7 -->|"store digest"| VK

    %% Styling
    style Sources fill:#1e293b,stroke:#ef4444,color:#e2e8f0
    style EB fill:#1e293b,stroke:#FF9900,color:#e2e8f0
    style SF fill:#1e293b,stroke:#60a5fa,color:#e2e8f0
    style LambdaLayer fill:#1e293b,stroke:#10b981,color:#e2e8f0
    style AI fill:#1e293b,stroke:#a78bfa,color:#e2e8f0
    style WAFDef fill:#1e293b,stroke:#ef4444,color:#e2e8f0
    style NotifyLayer fill:#1e293b,stroke:#f59e0b,color:#e2e8f0
    style ValkeyState fill:#1e293b,stroke:#06b6d4,color:#e2e8f0
    style Anomaly fill:#1e293b,stroke:#818cf8,color:#e2e8f0
        

⚙️ Lambda Common Configuration (All 7 Functions)

Runtime: Go · provided.al2023
CPU: 1 vCPU (1770 MB = 1 full vCPU allocation)
Memory: 1770 MB (multiple of 3 — fast execution, no cold-start pain)
Timeout: 60s (except tbi-ops-digest: 180s)
VPC: Not in VPC — uses public API endpoints + AWS service APIs
IAM Role: tbi-autonomous-ops-role (shared)
Architecture: Linux x86-64 (ELF) — cross-compiled with GOOS=linux GOARCH=amd64
Deploy: bash scripts/kcc.sh deploy-lambda autoops <name>

📦 Binary Sizes (Deployed)

tbi-ops-notify
8.3 MB
tbi-ops-self-heal
8.5 MB
tbi-ops-waf-action
8.2 MB
tbi-ops-honeypot-processor
8.2 MB
tbi-ops-bedrock-analyze
8.5 MB
tbi-raima-support
8.5 MB
tbi-ops-digest
8.5 MB

🗺️ Diagram Color Key

Event Sources & WAF Defense
EventBridge — event routing & scheduling
Step Functions — workflow orchestration
Lambda Functions — Go action execution
Amazon Bedrock — AI analysis & generation
Notifications — SNS → SES email pipeline
Valkey State — operational data & action logs
Anomaly Detection — ML-based predictive alarms

3. Event-Driven Automation (EventBridge)

Amazon EventBridge is the central nervous system of AutoOps. It receives events from across the infrastructure and routes them to the appropriate automated response.

How It Works

Pattern: Event Source → EventBridge Rule (pattern match) → Target (Step Function or Lambda)

Every CloudWatch alarm state change, every GuardDuty finding, and every scheduled interval generates an event. EventBridge rules match these events by pattern and invoke the correct automated response — no polling, no delays.

Diagram 3.1 — EventBridge Event Flow
graph LR
    A["CloudWatch Alarm
State: ALARM"] -->|"Event Pattern"| EB["EventBridge
Event Bus"] B["Scheduled
rate(5 minutes)"] -->|"Schedule"| EB EB -->|"tbi-ops-alarm-trigger"| SF["Step Function
health-check-heal"] EB -->|"tbi-ops-honeypot-queue"| L["Lambda
honeypot-processor"] style EB fill:#1e293b,stroke:#FF9900,color:#e2e8f0 style SF fill:#1e293b,stroke:#60a5fa,color:#e2e8f0 style L fill:#1e293b,stroke:#10b981,color:#e2e8f0

Active Rules

Rule NameEvent PatternTargetPurpose
tbi-ops-alarm-triggerCloudWatch alarm state change → ALARMStep Function: tbi-ops-health-check-healTriggers automated health check and self-healing workflow when any alarm fires
tbi-ops-honeypot-queue-processorScheduled: every 5 minutesLambda: tbi-ops-honeypot-processorDrains the honeypot auto-block queue and applies WAF rules for repeat offenders
tbi-ops-bedrock-analyze-scheduleScheduled: every 5 minutesLambda: tbi-ops-bedrock-analyzePeriodic AI threat analysis — skips Bedrock when infrastructure is quiet (cost-conscious)
tbi-ops-guardduty-high-findingGuardDuty Finding, severity ≥ 7Lambda: tbi-ops-bedrock-analyzeImmediate AI threat analysis on HIGH/CRITICAL GuardDuty findings
trinity-beast-nightly-syncScheduled: cron(0 6 * * ? *) (1 AM EST)ECS Task: trinity-beast-sync-jobNightly AuroraValkey sync + search index rebuild

Future Rules (Planned)

4. Workflow Orchestration (Step Functions)

AWS Step Functions orchestrate multi-step automated responses that require sequencing, branching, and retry logic. A single Lambda can restart a task — but a Step Function can check health, wait, recheck, decide whether to restart or escalate, verify recovery, and notify.

Active State Machine

tbi-ops-health-check-heal Step Function

Triggered by CloudWatch alarm via EventBridge. Orchestrates the full self-healing cycle.

Workflow Steps

StepActionOn SuccessOn Failure
1. Check HealthInvoke tbi-ops-self-heal with action: check-healthIf all healthy → notify INFO and exitContinue to step 2
2. Wait 60sBuilt-in Wait state — allow transient issues to resolveContinue to step 3
3. Recheck HealthInvoke tbi-ops-self-heal againIf recovered → notify SELF-HEALED and exitContinue to step 4
4. Force DeployInvoke tbi-ops-self-heal with action: force-deployContinue to step 5Notify CRITICAL + escalate
5. Verify RecoveryWait 90s, then recheck healthNotify SELF-HEALEDNotify CRITICAL — requires attention

Design Decision: The 60-second wait before action prevents false positives from transient network blips or rolling deployments. Most alarms self-resolve within this window. Only persistent failures trigger the force-deploy.

5. AI Threat Analysis (Amazon Bedrock)

Amazon Bedrock provides the intelligence layer — correlating multiple weak signals into a coherent threat picture that simple threshold-based alarms would miss.

Model Configuration

Model
Claude Sonnet 4.6
Model ID
us.anthropic.claude-sonnet-4-6
Access Method
Inference Profile
Max Tokens
1,024
Diagram 5.1 — Bedrock Threat Analysis Pipeline
graph TB
    subgraph Gather["Signal Gathering"]
        S1["WAF blocks (24h)"]
        S2["Honeypot hits"]
        S3["Alarm state"]
        S4["GuardDuty findings"]
        S5["Self-heal count"]
        S6["Recent IPs"]
    end

    subgraph Decide["Quiet Check"]
        QC{"All signals
below threshold?"} end subgraph Analyze["Bedrock Analysis"] BR["Claude Sonnet 4.6
System: Security Analyst
Input: Threat signals JSON
Output: Structured report"] end subgraph Act["Response Actions"] A1["Store report in Valkey"] A2["Notify if HIGH/CRITICAL"] A3["Execute safe auto-actions"] end S1 --> QC S2 --> QC S3 --> QC S4 --> QC S5 --> QC S6 --> QC QC -->|"Yes"| Skip["Log quiet · Skip Bedrock"] QC -->|"No"| BR BR --> A1 BR --> A2 BR --> A3 style Gather fill:#1e293b,stroke:#ef4444,color:#e2e8f0 style Decide fill:#1e293b,stroke:#f59e0b,color:#e2e8f0 style Analyze fill:#1e293b,stroke:#a78bfa,color:#e2e8f0 style Act fill:#1e293b,stroke:#10b981,color:#e2e8f0

What Bedrock Analyzes

SignalSourceWhat It Tells Us
WAF blocks (24h)CloudWatch / /public/infrastructureVolume of blocked malicious requests — baseline vs. spike
Honeypot hitsValkey honeypot:logActive scanning/enumeration attempts against decoy endpoints
Auto-blocked IPsValkey honeypot:blocked_ipsHow many IPs have been automatically banned
Pending queueValkey honeypot:autoblock_queueIPs awaiting WAF block — indicates active attack if growing
Alarms firingCloudWatchInfrastructure health issues (5xx, health check failures)
GuardDuty findingsGuardDutyNetwork-level threats (port scanning, unusual API calls)
Self-heal countValkey autoops:self-heal:countFrequency of automated recoveries — high count may indicate underlying issue
Recent honeypot IPsValkey honeypot:log (top 10)Specific attacker IPs for pattern analysis

Output Format

Bedrock returns structured JSON:

  • severity: NONE | LOW | MEDIUM | HIGH | CRITICAL
  • summary: Plain English description of the threat landscape
  • patterns: Identified attack types (port scanning, credential stuffing, DDoS, enumeration)
  • recommendations: Human-actionable suggestions for Cory
  • auto_actions: Safe, reversible actions the system can take automatically

Cost Control

Bedrock is only invoked when signals exceed quiet thresholds (WAF blocks > 50, honeypot hits > 10, any alarm firing, any GuardDuty finding). During quiet periods, the Lambda logs quiet status and exits without calling Bedrock — keeping costs near zero.

6. Self-Healing Lambda Functions

Seven Go Lambda functions form the action layer of AutoOps. Each is a standalone binary, cross-compiled for Linux, running on the provided.al2023 runtime at 1770 MB memory (multiple of 3 — fast execution, no cold-start pain).

Function Inventory

All functions share the same base configuration: Go · provided.al2023 · 1 vCPU · 1770 MB · Not in VPC · IAM role: tbi-autonomous-ops-role.

FunctionBinary SizeTimeoutLayerActions
tbi-ops-notify8.3 MB60s1Branded HTML email via SES · SNS fallback · Severity badges [INFO/WARNING/CRITICAL/SELF-HEALED]
tbi-ops-self-heal8.5 MB60s1force-deploy, restart-task, check-health · Logs actions to Valkey
tbi-ops-waf-action8.2 MB60s1block-ips, unblock-ips, list-blocked · Manages tbi-autoops-blocked-ips WAF IP set
tbi-ops-honeypot-processor8.2 MB60s1Drain honeypot:autoblock_queue → deduplicate IPs → invoke waf-action (sync) → notify (async)
tbi-ops-bedrock-analyze8.5 MB60s2Gather 8 threat signals → quiet check → Claude Sonnet 4.6 analysis → store in Valkey → notify if HIGH/CRITICAL → execute safe auto-actions
tbi-raima-support8.5 MB60s4Auto-categorize ticket (8 categories) → Claude Sonnet 4.6 draft response in preferred_lang → pre-fetch diagnostics → notify Cory with full context
tbi-ops-digest8.5 MB180s5Daily 300-word summary + weekly newsletter pipeline · Reads report:text:YYYY-MM-DD from Valkey · Claude Sonnet 4.6 narrative · Auto-translate 11 languages · Send via notify

Inter-Lambda Communication

AutoOps Lambdas invoke each other directly via the AWS Lambda Invoke API:

  • honeypot-processor → invokes waf-action (synchronous) and notify (async)
  • bedrock-analyze → invokes notify (async) and waf-action (synchronous for auto-blocks)
  • bedrock-support → invokes notify (async) with ticket analysis and draft response
  • digest → invokes notify (async) with daily/weekly operational summary
  • Step Function → invokes self-heal and notify as workflow steps

Shared IAM Role

All 7 functions share tbi-autonomous-ops-role with permissions for: ECS (update-service, stop-task), WAF (update-ip-set), CloudWatch (get-metrics, describe-alarms), Bedrock (invoke-model), SES (send-email), SNS (publish), SQS (receive/delete), Secrets Manager (get-secret), and Lambda (invoke-function on tbi-ops-*).

7. Honeypot Defense System

12 decoy endpoints that no legitimate user would ever access. Any hit reveals a scanner, bot, or attacker probing for vulnerabilities.

Trap Endpoints

Endpoint
/wp-admin
Endpoint
/.env
Endpoint
/phpmyadmin
Endpoint
/.git/config
Endpoint
/wp-login.php
Endpoint
/administrator
Endpoint
/xmlrpc.php
Endpoint
/backup.sql
Endpoint
/config.php
Endpoint
/debug
Endpoint
/actuator
Endpoint
/server-status

How It Works

StepActionStorage
1. Hit detectedLog IP, user-agent, path, timestampValkey: honeypot:ip:{ip} hash + honeypot:log sorted set
2. Tarpit2-second delay response (wastes scanner time)
3. Threshold checkIf same IP hits 2+ traps → queue for auto-blockValkey: honeypot:autoblock_queue
4. Queue drain (5 min)EventBridgehoneypot-processor Lambda
5. WAF blockIPs added to tbi-autoops-blocked-ips WAF IP setValkey: honeypot:blocked_ips set
6. NotifySELF-HEALED notification with blocked IP list

Table-driven design: Honeypot paths are stored in a Valkey set (honeypot:paths). Add or remove trap endpoints without redeploying — just update the set.

8. WAF Automation

The WAF layer provides both static protection (managed rule groups) and dynamic, AI-driven blocking via the AutoOps IP set.

WAF Configuration

WAFProtectsRules
trinity-beast-api-wafALB (API)IP Reputation, Common Rules, Known Bad Inputs, SQL Injection, Rate Limit Global (2000/5min), Rate Limit Admin (100/5min), AutoOps-BlockedIPs (priority 7)
CreatedByCloudFront-449feaa5CloudFront (Website)Anti-DDoS, IP Reputation, Common Rules, Known Bad Inputs

AutoOps WAF Integration

tbi-autoops-blocked-ips WAF IP Set

ID: 8d55de25-8ba5-4982-8c41-f4316c9bd50d

Managed exclusively by AutoOps Lambdas. IPs are added by:

  • tbi-ops-honeypot-processor — repeat honeypot offenders (2+ hits)
  • tbi-ops-bedrock-analyze — AI-recommended blocks (safe auto-actions)
  • tbi-ops-waf-action — manual blocks via KCC commands

9. Notification & Alerting

All operational notifications flow through a branded email pipeline. No raw text, no ugly AWS formatting — every alert arrives as a professionally designed HTML email via Amazon SES.

Notification Flow

The tbi-ops-notify Lambda serves as the single gateway for all notifications. It accepts events from two paths:

Diagram 9.1 — Branded Email Notification Pipeline
graph LR
    subgraph Sources["Event Sources"]
        CW["CloudWatch Alarms"]
        BA["tbi-ops-bedrock-analyze"]
        SH["tbi-ops-self-heal"]
        DG["tbi-ops-digest"]
        HP["tbi-ops-honeypot-processor"]
        SP["tbi-raima-support"]
    end

    subgraph Router["SNS Topic"]
        SNS["tbi-ops-notifications"]
    end

    subgraph Notify["Notification Lambda"]
        NL["tbi-ops-notify"]
        Parse["Parse Event Type"]
        Template["Build HTML Email"]
    end

    subgraph Delivery["Email Delivery"]
        SES["Amazon SES"]
        Email["Branded HTML Email
Dark theme · Structured sections
Severity badge · Action cards"] end subgraph Fallback["Fallback Path"] SNSFB["SNS Plain Text
(only if SES fails)"] end CW -->|"Alarm state change"| SNS SNS -->|"Lambda subscription"| NL BA -->|"Direct invoke"| NL SH -->|"Direct invoke"| NL DG -->|"Direct invoke"| NL HP -->|"Direct invoke"| NL SP -->|"Direct invoke"| NL NL --> Parse Parse --> Template Template --> SES SES --> Email Template -.->|"SES failure"| SNSFB style Sources fill:#1e293b,stroke:#60a5fa,color:#e2e8f0 style Router fill:#1e293b,stroke:#FF9900,color:#e2e8f0 style Notify fill:#1e293b,stroke:#10b981,color:#e2e8f0 style Delivery fill:#1e293b,stroke:#a855f7,color:#e2e8f0 style Fallback fill:#1e293b,stroke:#64748b,color:#94a3b8 linkStyle 0 stroke:#FF9900 linkStyle 1 stroke:#FF9900 linkStyle 2 stroke:#10b981 linkStyle 3 stroke:#10b981 linkStyle 4 stroke:#10b981 linkStyle 5 stroke:#10b981 linkStyle 6 stroke:#10b981 linkStyle 7 stroke:#60a5fa linkStyle 8 stroke:#60a5fa linkStyle 9 stroke:#a855f7 linkStyle 10 stroke:#a855f7 linkStyle 11 stroke:#64748b

SNS Topic

Topic
tbi-ops-notifications
ARN
arn:aws:sns:us-east-2:211998422884:tbi-ops-notifications
Subscriber
tbi-ops-notify Lambda
Protocol
Lambda (no email subscriptions)

Email Delivery

Service
Amazon SES
From
CPMP Mission <No-Reply@CPMP-Site.org>
To
CoryDeanKalani@Gmail.com
Format
Branded HTML (dark theme, Gmail-compatible)

Email Template Features

Event Parsing

The Lambda auto-detects the incoming event format and normalizes it:

Source FormatDetectionParsing
CloudWatch Alarm (via SNS)Records[].Sns.Message contains AlarmNameExtracts alarm name, state transition, metric, reason → structured sections
AutoOps Lambda (direct)severity + title fields presentUsed directly — already in NotifyEvent format
Generic SNS messageHas Records array but no alarm fieldsSubject becomes title, message body becomes detail

Severity Levels

LevelSubject PrefixWhen UsedAction Required
INFO[INFO]Health check passed, digest generated, routine statusNone — informational only
WARNING[WARNING]HIGH threat detected, anomaly alarm, elevated activityReview when convenient
CRITICAL[CRITICAL]Self-heal failed, IAM changes detected, EC2 launch detectedImmediate attention required
SELF-HEALED[SELF-HEALED]Problem detected AND fixed automatically, alarm returned to OKNone — system handled it

Design Principles

10. Custom AI Translation Engine (Bedrock)

A custom Bedrock-powered translation engine replaces AWS Translate for all document translation. The engine uses sentinel preprocessing, multi-layer validation, and Step Function orchestration to produce structurally correct translations that preserve code blocks, diagrams, and brand terminology.

Translation Coverage

Languages
12 (incl. English)
Documents
407+ HTML files
Engine
Custom Bedrock (Claude Sonnet 4.6)
Worker
ECS Fargate (Python 3.11)

Architecture

ComponentTypePurpose
POST /admin/translateAdmin APIJob submission, monitoring, control (9 endpoints)
trinity-beast-translation-queueSQSDecouple submission from execution
tbi-translate-pipeEventBridge PipeSQS → Step Function (no glue Lambda)
tbi-translation-orchestratorStep FunctionsFan-out: docs serial, languages parallel ×11
tbi-translate-workerECS Fargate TaskBedrock translation + sentinel preprocessing + validation
tbi-translate-deployLambda (Go)CloudFront invalidation per document
tbi-translate-finalizeLambda (Go)Search index rebuild + notification

Key Innovations

Translation Protection

Principle: Translate the understanding, preserve the execution. Code blocks, Mermaid diagrams, and technical identifiers are protected by the sentinel system. A developer in any language reads the explanation in their native tongue, then copy-pastes the code and has it work on the first try.

Full documentation: See AutoOps-Translation-Engine.html" style="color:#60a5fa;">AutoOps Translation Engine for the complete technical reference — sentinel types, validator system, Step Function orchestration, admin API, cost protection, and operations guide.

11. Observability & Monitoring

The monitoring layer feeds events into AutoOps and provides the data that Bedrock analyzes.

CloudWatch Alarms

AlarmMetricThresholdAutoOps Response
WAF High Block RateWAF BlockedRequests> 1000/5minTriggers Bedrock analysis
API 5xx SpikeALB 5xx count> 10/minHealth check → self-heal workflow
API 4xx SpikeALB 4xx count> 100/minBedrock analysis for abuse patterns
GuardDuty FindingGuardDuty eventAny HIGH/CRITICALImmediate notification + Bedrock analysis
ECS CPU HighECS CPUUtilization> 80%Alert (scaling is manual)
Service Count LowECS RunningTaskCount< 1Health check → force-deploy
Aurora CPU HighRDS CPUUtilization> 80%Alert + ACU review

GuardDuty

Detector: 18ceef6f8dddcf6082473cc7016ee458
Sources: VPC flow logs, CloudTrail API calls, DNS query logs
Integration: Findings generate EventBridge events → AutoOps response

Public Infrastructure Endpoint

All monitoring data is exposed via GET /public/infrastructure — a single unauthenticated endpoint that powers the Infrastructure Live page. Cached 60s in Valkey. Includes CloudFront, WAF, SQS, Lambda, Sync, GuardDuty, Alarms, Honeypot, and AutoOps stats.

12. Cost Analysis

The entire AI and automation layer runs for approximately $10–15/month — a remarkably low cost for a fully autonomous, AI-powered operations system running 24/7.

ServiceUsage PatternMonthly Cost
EventBridge~8,640 events/month (alarm changes + scheduled)~$0.50
Step Functions~100 executions/month (only on alarm)~$1.00
Lambda (7 AutoOps)~10,000 invocations/month, 1770 MB, <5s avg~$2–5
Amazon Bedrock~50–200 invocations/month (only when signals active)~$2–5
SNS~100 notifications/month~$0.10
Amazon Bedrock (Translation)~$1.65 per doc-language pair, usage-dependent~$2–10
Total AI & Automation~$10–15/mo

Cost-conscious by design: Bedrock is only called when threat signals exceed quiet thresholds. During normal operations (most of the time), the Lambda checks signals, finds everything quiet, and exits without invoking the AI model. This keeps Bedrock costs near zero during peaceful periods.

13. Future Layers

All 5 layers of the AutoOps system are fully deployed and operational.

LayerNameStatusPurpose
1Autonomous OperationsLIVEEvent-driven self-healing, WAF automation, notifications
2Intelligent Threat ResponseLIVEAI-powered threat analysis via Bedrock (every 5 min + GuardDuty HIGH)
3Predictive OperationsLIVECloudWatch Anomaly Detection — 4 ML models learning traffic patterns
4Customer EngagementLIVEAI-assisted support — auto-categorize, auto-reply as AutoOps, human escalation option, multi-lingual
5Self-Documenting InfrastructureLIVEBedrock-generated daily + weekly operational digests

Layer 3: Predictive Operations (CloudWatch Anomaly Detection)

Layer 4: Customer Engagement — Raima (tbi-raima-support)

Layer 5: Self-Documenting Infrastructure (tbi-ops-digest)

Weekly Newsletter Data Pipeline

The weekly newsletter uses daily operations reports as its primary source of truth — not just a point-in-time snapshot:

  1. Session reports generatedKiro saves HTML reports to ~/daily-reports/ and uploads to s3://trinity-beast-website-east2/daily-reports/
  2. Nightly sync job — reads HTML reports from S3, strips tags to plain text, stores in Valkey as report:text:YYYY-MM-DD (30-day TTL). Backfills all dates from manifest that don't already exist.
  3. Weekly digest Lambda — on Monday 8 AM EDT, fetches last 7 days of report:text:* from Valkey (up to 12 KB per day)
  4. Bedrock generation — daily reports fed as primary context with accuracy guardrails (never fabricate data, never speculate on missing metrics)
  5. Save + translate + send — newsletter saved to Aurora, auto-translated to 11 languages, sent to all active subscribers via SES

Key design decision: The newsletter reads REAL operational data from daily reports — actual WAF block counts, actual session accomplishments, actual security events. Bedrock rewrites this for subscribers but cannot invent or speculate. If a metric is unavailable, it is omitted rather than explained away.