The Trinity Beast Infrastructure

Current Status Overview

✅ Already Optimized

Network & Load Balancing

v3.9.3 ALB Connection Tuning: 60s idle timeout, 120s keep-alive, 10s deregistration delay, LOR routing on both target groups, invalid header rejection
v3.9.3 NLB Connection Tuning: Cross-zone load balancing enabled, 10s deregistration delay on both UDP target groups, healthy threshold reduced to 2 (1 min recovery vs 2.5 min)
CORS: Enabled with minimal overhead

Price Feed Architecture

6x WebSocket Price Feeds: Coinbase, Gemini, Kraken, Gate.io, Crypto.com, OKX — persistent push-based connections, 150 prewarmed assets arrive before requests
Per-Container WebSocket Independence: Each container runs its own 6 WS connections, local-only sync.Map writes (no ElastiCache hammering)
6x REST Fallbacks: One per exchange (Coinbase, Gemini, Kraken, Gate.io, Crypto.com, OKX) with health tracking — only used if the corresponding WS feed is stale
Per-Exchange Failover: Each exchange has independent health tracking — a stale Kraken WS triggers Kraken REST, not a cascade across all exchanges
Response-First Architecture: Background logging and metrics — response sent before any write operations

Compute & Runtime

Fargate Tasks: 8 vCPU / 32 GB each — all 3 running APP_REPORT_SERVER across 3 AZs
Go Runtime: Using all CPUs with runtime.GOMAXPROCS(runtime.NumCPU())
Garbage Collection: GOGC=300 (configurable via env var, up from 200)
v3.3 Background Worker Pool: 999 slots (up from 500)
v3.3 System Mode Toggle: Demo/Performance/Debug profiles via /admin/system-mode

Cache & Data Layer

ElastiCache cache.r7g.2xlarge: 52.8 GB cache memory, 400K+ ops/sec capacity, single node, no replica
v3.9.3 ElastiCache Pipelining: All 6 sequential HGetAll loops (4 LRS + 2 UDP) replaced with single-round-trip pipelines via PipelineHGetAll()
ElastiCache API Key Cache: 3-layer lookup (sync.Map → ElastiCache → Aurora) with write-through
ElastiCache App Config: Application parameters read from ElastiCache first, Aurora fallback
Shared Rate Limiting: Atomic Lua script in ElastiCache — all price-serving containers share rate limit counters
Real-time Usage Counters: HINCRBY in ElastiCache on every request for instant LRS stats
ElastiCache Connection Pooling: 300 pool size, 60 min idle connections
Aurora Optimized I/O: Unlimited IOPS, no per-I/O charges, 40% cost savings, 2–18 ACU
Database Connection Pooling: Configurable via app params (150 open / 75 idle per container)
v3.3 Micro-Batch Aurora Write Smoothing: 300 rows / 100ms (configurable via app params)

UDP Protocol (v8 Engine)

v8 SO_REUSEPORT: 8 sockets per protocol — per-socket kernel receive queue eliminates buffer bottleneck
v8 recvmmsg Batch Reads: 32 datagrams per syscall (~32× reduction in read syscalls)
v8 Pre-Serialized Response Cache: sync.Map of pre-built byte slices (~2× faster for cache hits)
v8 32 MB Socket Buffers: Per socket (up from 8 MB in v3.3)
v8 1,024 Concurrent Handlers: 8 SO_REUSEPORT sockets × 128 workers per socket
UDP 3-Tier Cache: sync.Map → ElastiCache → REST (matches TCP handler)
v3.3 Compiled Go Stress Test Client: cmd/stress/ in mono repo

Current Performance Metrics

TCP Peak (Direct)

369,600

Combined Sustained

746,374

TCP Avg Latency

0.3ms

UDP Avg Latency

0.2ms

Cache Hit Rate

99%+

WebSocket Feeds

6 Active

ElastiCache Pool

300 conn

Aurora ACU

2–18

Implemented in v3.3 ✅ Shipped

The following optimizations were implemented and validated during the v3.3 stress test session. Each change was tested under sustained load with the compiled Go stress client.

Optimization	Before (v3.0)	After (v3.3)	Impact
Container CPU	2 `vCPU` / 8 GB	8 `vCPU` / 32 GB	4x throughput, no CPU saturation
Aurora ACU ceiling	6	18	Supports 193K req/sec
GC tuning	GOGC=`200`	GOGC=`300`	Fewer GC pauses under load
Worker pool	`500` slots	999 slots	More background work capacity
ElastiCache pool	50 connections	`300` connections, 60 min idle	No pool exhaustion under load
UDP readers	1 per socket	3 per socket	Parallel packet intake
UDP buffers	OS default (~200KB)	8MB read + 8MB write	No packet loss at high throughput
UDP cache	No ElastiCache tier	Full 3-tier (sync.Map → ElastiCache → REST)	Matches TCP cache architecture
Batch writes	`500` rows / 10s (bursty)	`300` rows / 100ms micro-batch (smooth)	Aurora ACU spikes eliminated
Test client	Python (GIL-bound, ~`200` req/sec UDP)	Compiled Go (487K+ req/sec UDP)	Accurate server benchmarking

Remaining Optimization Opportunities 🚀 Potential Improvements

1. ALB Connection Settings ✅ DEPLOYED

ALB optimized for connection reuse, faster deregistration, and security hardening. Deployed April 26, 2026.

Setting	Before	After	Impact
Idle timeout	300s (5 min)	`60s`	Frees connection slots 5x faster
Client keep-alive	3600s (1 hr)	`120s`	Clients reconnect every 2 min instead of hoarding
Deregistration delay (both TGs)	`30s`	`10s`	Deploys drain 20s faster per service
LRS routing algorithm	`round_robin`	`least_outstanding_requests`	Smarter load distribution, matches LPO
Drop invalid headers	disabled	enabled	Security hardening — malformed headers rejected at ALB

2. ElastiCache Pipelining ✅ DEPLOYED

All sequential HGetAll loops replaced with single-round-trip pipelines across 6 handler locations. Deployed April 26, 2026.

// PipelineHGetAll — one round trip instead of N sequential calls
pipe := client.Pipeline()
cmds := make([]*redis.MapStringStringCmd, len(ids))
for i, id := range ids {
    cmds[i] = pipe.HGetAll(ctx, fmt.Sprintf("usage_log:%s", id))
}
pipe.Exec(ctx) // single round trip for all N hashes

Locations pipelined: LRS Usage Report, LRS Summary Report, LRS Report Usage Detail, LRS Report Usage Summary, UDP Summary, UDP Usage — all 6 sequential loops converted.

Impact: A report returning 50 rows now makes 1 ElastiCache round trip instead of 50. 30-40% latency reduction on LRS reports.

3. Prewarm Optimization SUPERSEDED

This optimization was designed for the REST polling era. It no longer applies — all 150 assets are now served by 6 persistent WebSocket feeds that push prices in real-time.

Exchange	Assets	Feed Type	Latency
Coinbase	BTC, ETH, SOL, DOGE, XRP, LINK, DOT, LTC, AVAX, UNI, PEPE, XLM	WebSocket (push)	0ms (in-memory)
Gemini	AAVE, ADA, MATIC, ATOM, NEAR, ARB, MKR, CRV, GRT, FIL, SHIB, BAT	WebSocket (push)	0ms (in-memory)
Kraken	NANO, SC, LSK, KAVA, BICO, RARI, OCEAN, CFG, CQT, ALGO, FET, FLOW	WebSocket (push)	0ms (in-memory)
Gate.io	BNB, TRX, APT, SEI, INJ, OP, SUI, VET, HBAR, SAND, MANA, FTM	WebSocket (push)	0ms (in-memory)
Crypto.com	TON, WLD, APE, BLUR, IMX, ENS, LDO, SNX, COMP, 1INCH, SUSHI, GALA	WebSocket (push)	0ms (in-memory)
OKX	KAS, TIA, JUP, STRK, PYTH, W, ZRO, PENDLE, ONDO, RENDER, WIF, FLOKI	WebSocket (push)	0ms (in-memory)

Why it's obsolete: The original proposal called for tiered REST polling intervals (top assets every 5 min, mid every 15 min, low every 30 min) and staggered timing across containers. With 6 WebSocket feeds pushing every trade in real-time, prices arrive before requests — there's nothing to poll and nothing to stagger. PrewarmCache() runs once at startup as a bootstrap, then WebSocket feeds take over permanently. Natural staggering already occurs because each container's 6 WebSocket connections establish at slightly different times during startup.

4. Aurora Scaling Headroom FUTURE

Monitor Aurora ACU usage and adjust max capacity if needed. Current range is 2–18 ACU.

Current Load	ACU Range	Action
Consistently under 5 ACU	2–18 ACU	✅ Current — right-sized
Spiking to 18 ACU	2–32 ACU	⚠️ Increase max to 32
Sustained at 18 ACU	2–48 ACU	🚨 Increase max to 48

Monitor: CloudWatch metric ServerlessDatabaseCapacity

5. Task Count Scaling FUTURE

Scale ECS tasks horizontally when traffic increases. Costs reflect 8 vCPU / 32 GB containers.

Traffic Level	Main Tasks	Mirror Tasks	LRS Tasks	Monthly Cost
Current (Low)	1	1	1	`$430`
Medium (50K QPS)	2	2	1	$670
High (100K QPS)	3	2	2	$970
Very High (200K QPS)	5	3	2	$1,`390`

Trigger: When CPU > 70% or latency > 100ms consistently

6. ElastiCache Scaling FUTURE

Current node is cache.r7g.2xlarge (52.8 GB). ElastiCache is a pure cache layer — Aurora is the source of truth.

Node Type	Memory	Throughput	Monthly Cost
cache.r7g.2xlarge (current)	52.8 GB	400K ops/sec	$637
`cache.r7g.4xlarge`	`105` GB	800K ops/sec	~$1,`274`
cache.r7g.2xlarge + replica	52.8 GB × 2	400K ops/sec + read replica	~$1,`274`

Trigger: When memory > 80% or CPU > 70% consistently

🎯 Recommended Priority

Immediate

All done ✅ — DB pooling, ElastiCache pooling, batch writes, GC tuning, UDP optimizations, worker pool, and system mode toggle all shipped in v3.3.

Short Term (Next 1-2 Weeks)

Monitor v3.3 Metrics - CloudWatch dashboards for Aurora ACU, ElastiCache CPU/memory, ALB latency under real traffic
Tune SQS Pipeline Params - Adjust sqs_flush_ms and sqs_buffer_size via app params if queue depth patterns change
~~ALB Connection Settings — ✅ Deployed April 26, 2026~~
~~ElastiCache Pipelining — ✅ Deployed April 26, 2026~~

Long Term (Based on Metrics)

~~Prewarm Strategy — Superseded by 6 real-time WebSocket feeds (150 assets, 0ms latency)~~
Horizontal Scaling - Add tasks when traffic increases
ElastiCache Upgrade - Move to xlarge when ops/sec approaches 100K sustained

Monitoring & Metrics

Key CloudWatch Metrics to Watch

Aurora Serverless v2

ServerlessDatabaseCapacity - Current ACU usage (target: 2-10 ACU normal, up to 18 under stress)
DatabaseConnections - Active connections (target: < 450)
ReadLatency / WriteLatency - Query performance (target: < 5ms)

ElastiCache

CPUUtilization - CPU usage (target: < 70%)
DatabaseMemoryUsagePercentage - Memory usage (target: < 80%)
CacheHitRate - Cache effectiveness (target: > 85%)
NetworkBytesIn / NetworkBytesOut - Throughput

ECS Fargate

CPUUtilization - Task CPU usage (target: < 70%)
MemoryUtilization - Task memory usage (target: < 85%)

Application Load Balancer

TargetResponseTime - Backend latency (target: < 50ms)
RequestCount - Traffic volume
HealthyHostCount - Available targets (target: = desired count)
HTTPCode_Target_5XX_Count - Backend errors (target: 0)

Performance Bottleneck Analysis

Symptom	Likely Cause	Solution
High latency (> 100ms)	All WebSocket feeds down, REST fallback active	Check WS connections in logs, verify Gemini/Coinbase WS endpoints
Low cache hit rate (< 95%)	WebSocket feeds disconnected or stale	Check GEMINI-WS/COINBASE-WS logs, verify network connectivity
High CPU on ECS tasks	Too many concurrent requests	Scale horizontally (add more tasks)
High memory on ECS tasks	Memory leak or large response caching	Review code for leaks, increase task memory
Aurora ACU spiking to max	Heavy database queries or connections	Optimize queries, add connection pooling, increase max ACU
Aurora ACU spiking	SQS consumer Lambda batch size too large or too frequent	Adjust Lambda batch size or batching window in the SQS event source mapping
ElastiCache CPU high	Too many cache operations	Pipelining deployed ✅ — upgrade node type if still high
ElastiCache memory high	Too much cached data	Reduce cache TTL or upgrade node type
ALB 5xx errors	Backend tasks unhealthy or overloaded	Check task logs, scale horizontally

Service Offerings — Partners & Associates

The Trinity Beast serves three distinct audiences, each with its own delivery path optimized for their use case. All three share the same 6-exchange WebSocket price engine — the difference is how prices reach the consumer.

Audience	Delivery Method	Connection	Latency	Rate Limiting	Cost
Public Subscribers	REST API (TCP/UDP)	Request/Response	0.3ms TCP / 0.2ms UDP	Per-tier QPS + burst	Free – $3,000 lifetime
Partners	WebSocket (persistent)	AWS PrivateLink / VPC Peering	<1ms (push)	None — unlimited	Free (mission-aligned)
Associates	Webhook Push (UDP + HTTPS)	Public internet	~0.1ms UDP / ~50-200ms HTTPS	Tier-based interval	$30 – `$540/mo`

Partner WebSocket Feed ✅ LIVE

Partners are AWS companies that need live crypto prices for their own products. They connect via AWS PrivateLink (TCP) or VPC Peering (UDP) — private network, no public internet traversal. Each Partner receives a persistent WebSocket connection that pushes every price update in real-time from the local sync.Map cache.

Handler: internal/handlers/partner_ws.go
Connection: Upgraded HTTP → WebSocket via gorilla/websocket
Price source: Local wsPriceCache (sync.Map) — same feeds as LPO, zero network hop
Rate limiting: None. Partners are exempt from all QPS/burst/monthly limits.
API key cache: 60-minute TTL (vs 5-minute for public tiers) — reduces Aurora lookups
Authentication: API key with tier = 'partner' in the rate_limit_template table
Why free: We receive price data freely from exchanges via WebSocket — we give it freely to mission-aligned partners

Associate Webhook Push ✅ LIVE

Associates subscribe to receive prices pushed to their endpoints at tier-configured intervals. The BeastWebhook service (4th ECS container, SERVER_TYPE=WEBHOOK_SERVER) runs its own 6 WebSocket feeds and pushes from its local cache — no ALB, no inbound ports, push-only.

Handler: internal/handlers/webhook.go + webhook_delivery.go
ECS Service: trinity-beast-webhook-service (8 vCPU / 32 GB, no ALB target)
Price source: Local wsPriceCache — same 6 WebSocket feeds, independent connections
UDP delivery: Fire-and-forget, single packet per asset per cycle. ~0.1ms. Zero retries.
HTTPS delivery: Signed POST with HMAC-SHA256 (X-Webhook-Signature). Retries with exponential backoff (base 1000ms, max 3 attempts).
Delivery log: Every push logged to webhook_delivery_log table (subscription_id, asset, price, source, method, latency, status)

Webhook Tier Performance Characteristics

Tier	Interval	Max Assets	Pushes/Hour	Pushes/Month	Price
Starter	`60s`	9	`540`	~`388,800`	$30/mo
Standard	`15s`	30	7,`200`	~5,`184,000`	$90/mo
Professional	`6s`	75	45,000	~32,`400,000`	`$210/mo`
Enterprise	`3s`	`150`	`180,000`	~`129,600,000`	`$540/mo`

Pushes/hour = (3600 ÷ interval) × max_assets. Enterprise at full capacity: 180,000 price pushes per hour, 129.6M per month — all from a single container reading its local sync.Map.

Architecture: Why All Three Share One Engine

Every ECS container (including BeastWebhook) maintains its own independent 6-exchange WebSocket connections. Prices flow into the local sync.Map with zero network hops. Whether a price is served via REST API, pushed over a Partner WebSocket, or delivered to an Associate webhook endpoint — it comes from the same in-memory cache, populated by the same real-time feeds.

┌─────────────────────────────────────────────────────────────────┐
│  6 Exchange WebSocket Feeds (per container)                      │
│  Coinbase · Gemini · Kraken · Gate.io · Crypto.com · OKX        │
└──────────────────────────┬──────────────────────────────────────┘
                           │ real-time trade messages
                           ▼
              ┌────────────────────────┐
              │   sync.Map (local)     │  ← 0ms read latency
              │   150 assets × 6 feeds │
              └────┬───────┬───────┬───┘
                   │       │       │
         ┌─────────┘       │       └──────────┐
         ▼                 ▼                   ▼
  ┌─────────────┐  ┌──────────────┐  ┌────────────────┐
  │ REST API    │  │ Partner WS   │  │ Webhook Push   │
  │ (TCP/UDP)   │  │ (persistent) │  │ (UDP + HTTPS)  │
  │ Subscribers │  │ PrivateLink  │  │ Associates     │
  └─────────────┘  └──────────────┘  └────────────────┘

This shared-engine architecture means adding Partners or Associates adds zero load to the price feed infrastructure. The WebSocket connections are already running. The sync.Map is already populated. The only additional work is the outbound push — which is trivial compared to the inbound price ingestion.

Conclusion

Current Assessment

The Trinity Beast Infrastructure v4.7 is battle-tested at scale. Run 17 validated:

746,374 combined RPS sustained for 30 minutes — 1.34 billion requests with zero degradation
369,600 TCP req/sec and 487,900 UDP req/sec (direct) — 100% success through all 13 concurrency levels
0.3ms TCP avg latency, 0.2ms UDP avg latency
943× improvement from v1.0 baseline across 17 test runs in 19 days
8 vCPU / 32 GB containers — scales from 3 (production) to 9 (proven at scale)
2–18 ACU Aurora range — right-sized with micro-batch write smoothing
6 persistent WebSocket price feeds (Coinbase, Gemini, Kraken, Gate.io, Crypto.com, OKX) — 150 prewarmed assets
99%+ cache hit rate — virtually every request served from memory
ElastiCache-backed API key validation, shared rate limiting, and real-time usage counters
v8 UDP engine: SO_REUSEPORT, recvmmsg batch reads, pre-serialized response cache

Recommendation: The system is production-ready and stress-tested well beyond expected traffic. A 3-year Compute Savings Plan is recommended to lock in cost savings on the 8 vCPU / 32 GB Fargate tasks. The remaining optimization opportunities (prewarm strategy, horizontal scaling) are for future scaling — not critical for current operations.

Run 17 eliminated every bottleneck found during stress testing. v4.7 added the v8 UDP engine (SO_REUSEPORT, recvmmsg), dedicated health servers, and 6-exchange WebSocket feeds — the remaining items are future-proofing for horizontal scale.