Custom Bedrock-powered document translation service — sentinel preprocessing, validator system, Step Function orchestration
The Trinity Beast Infrastructure maintains 40 technical documents translated into 11 languages — over 440 translated files total total. The original approach used AWS Translate batch jobs. It worked for simple prose but failed catastrophically on technical documentation.
AWS Translate is a neural machine translation service optimized for general-purpose text. Technical documentation with embedded code, diagrams, and brand terminology exposes its fundamental limitations:
| Failure Mode | Example | Impact |
|---|---|---|
| Translates code blocks | function getName() → función obtenerNombre() | Code no longer executes |
| Translates variable names | api_key → clave_api | Documentation references break |
| Breaks Mermaid diagrams | Translates node labels inside mermaid blocks | Diagrams fail to render |
| Corrupts HTML structure | Merges adjacent elements, drops attributes | Styling and layout break |
| Transliterates brand names | AutoOps → آٹو آپس (Urdu phonetic) | Brand identity lost, search breaks |
| Localizes numeric units | 32 GB → 32 Go (French) | Technical specs become ambiguous |
| Drops version numbers | PostgreSQL 17.7 → PostgreSQL | Version-specific guidance lost |
| Ignores translate attribute | Translates content inside protected zones | Defeats the HTML5 standard mechanism |
With 40 documents × 11 languages, every documentation update triggers a translation cascade. Before the custom engine:
A custom Bedrock-powered translation engine that understands the boundary between human language and machine language. The engine uses defense-in-depth across the full pipeline:
Result: A single POST /admin/translate call translates any document from any supported source language into up to 11 target languages, deploys to S3, invalidates CloudFront, rebuilds the search index, and emails a summary. Source language is auto-detected when not specified — no pivot through English required.
The translation service is an event-driven pipeline that decouples submission from execution. The operator submits a job; the system handles everything else asynchronously.
Diagram 2.1: End-to-End Pipeline Architecture
flowchart TB
subgraph Operator
A[POST /admin/translate]
end
subgraph "LPO Server (Go)"
B[Validate & Enqueue]
C[Valkey State]
D[Aurora Record]
end
subgraph "AWS Pipeline"
E[SQS Queue]
F[EventBridge Pipe]
G[Step Function]
end
subgraph "Translation Intelligence (Python)"
direction LR
subgraph "Pre-Processing"
H0[Source Validation]
H1[Language Detection]
H2[Complexity Analysis]
H3[Document Preprocessor]
end
subgraph "Translation Core"
H4[Sentinel System — 3 Types]
H5[Bedrock — 3-Region Failover]
H6[Validator — Hard + Soft Tiers]
H7[Integrity Check + Auto-Repair]
end
end
subgraph "Deployment (Go)"
direction LR
I[S3 Write]
J[CloudFront Invalidation]
K[Search Index Rebuild]
L[SES Notification]
end
A --> B
B --> C
B --> D
B --> E
E --> F
F --> G
G --> H0
H0 --> H1
H1 --> H2
H2 --> H3
H3 --> H4
H4 --> H5
H5 --> H6
H6 --> H7
H7 --> I
I --> J
J --> K
K --> L
style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style B fill:#1e293b,stroke:#334155,color:#e2e8f0
style C fill:#064e3b,stroke:#10b981,color:#e2e8f0
style D fill:#064e3b,stroke:#10b981,color:#e2e8f0
style E fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
style F fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
style G fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
style H0 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H1 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H2 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H3 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H4 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H5 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H6 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H7 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style I fill:#064e3b,stroke:#10b981,color:#e2e8f0
style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
style K fill:#064e3b,stroke:#10b981,color:#e2e8f0
style L fill:#064e3b,stroke:#10b981,color:#e2e8f0
| Component | Type | Runtime | Purpose |
|---|---|---|---|
POST /admin/translate (+ 8 more) | Admin API | Go | Job submission, monitoring, control |
trinity-beast-translation-queue | SQS | — | Decouple submission from execution |
tbi-translate-pipe | EventBridge Pipe | — | SQS → Step Function trigger (no glue Lambda) |
tbi-translation-orchestrator | Step Functions | — | Fan-out, retry, deploy, finalize orchestration |
tbi-translate-worker | ECS Fargate Task | Python 3.11 | Bedrock translation + sentinel + validation (no timeout ceiling) |
tbi-translate-init | Lambda | Go | Records execution ARN, transitions queued → running |
tbi-translate-deploy | Lambda | Go | CloudFront invalidation per document |
tbi-translate-finalize | Lambda | Go | Search rebuild + SES notification + state transition |
translation_jobs | Aurora table | — | Permanent job records (28 columns) |
translation_job_events | Aurora table | — | Granular per-doc/lang audit log |
Every other compute workload in The Trinity Beast Infrastructure is written in Go. The translation worker is the sole exception, and for good reason:
lxml for HTML parsing — Go's HTML parsers are adequate for simple tasks but lack the XPath and tree-manipulation capabilities needed for sentinel preprocessing on complex nested documents.Convention note: All Lambda functions use 1770 MB memory (multiple of 3). The worker runs as an ECS Fargate task (1 vCPU / 3 GB) with no timeout ceiling — large documents translate to completion regardless of processing time. Deploy and finalize Lambdas use 60s and 180s timeouts respectively.
The sentinel system is the core innovation that makes reliable technical document translation possible. It operates on a simple principle: the model cannot corrupt what it never sees.
Before any chunk is sent to Bedrock, protected content is replaced with placeholder tokens. The model translates the prose around the placeholders. After translation, the placeholders are swapped back to the original content. Validation then confirms everything survived intact.
__TBP{N}__)Replaces entire translate="no" elements with a single token. The model sees only the placeholder and places it in the natural position for the target language's word order.
| Before | After Sentinel Pass |
|---|---|
<span translate="no">CloudFront</span> invalidation |
__TBP0__ invalidation |
<code translate="no">api_key</code> parameter |
__TBP1__ parameter |
Handles arbitrary nesting depth — processes innermost elements first, then sweeps outward until stable.
__TBO{N}__ / __TBC{N}__)For plain <span> wrappers containing translatable text (badges, titles, method labels). The wrapper tags become sentinels; the text between them is translated normally.
| Before | After Sentinel Pass |
|---|---|
<span class="badge">UDP Port 2679</span> |
__TBO0__UDP Port 2679__TBC0__ |
The model translates "UDP Port 2679" while the <span class="badge"> wrapper survives intact.
__TBN{N}__)Protects bare numbers in prose from the model's tendency to drop, paraphrase, or localize them. Matches integers, decimals, percentages, and number+unit pairs.
| Before | After Sentinel Pass | Problem Prevented |
|---|---|---|
uses 1770 MB of memory |
uses __TBN0__ of memory |
French translating "MB" → "Mo" |
achieves 98.5% uptime |
achieves __TBN1__ uptime |
Japanese dropping the decimal |
62% cache hit rate |
__TBN2__ cache hit rate |
German paraphrasing to words |
__TBT{N}__)Protects brand terms, product names, and proper nouns that must never be translated or transliterated. Unlike Type A (which requires translate="no" in the source HTML), Type D operates from a centralized configuration list — no source markup needed.
| Before | After Sentinel Pass | Problem Prevented |
|---|---|---|
powered by The Trinity Beast |
powered by __TBT0__ |
Hindi transliterating to ट्रिनिटी बीस्ट |
deployed on CloudFront |
deployed on __TBT1__ |
Arabic transliterating to كلاود فرونت |
Cory Dean Kalani |
__TBT2__ |
Urdu transliterating person names |
Protected terms are defined in translation-config.json (57 terms). The sentinel pass matches terms using word-boundary regex for short terms (≤5 chars) and substring matching for longer terms. Restoration is exact — the original term text is re-injected at the sentinel position.
Complex-script models (Hindi, Urdu, Arabic) occasionally drop Type D sentinel tokens entirely from their output — the token simply doesn't appear in the translated text. The recovery pass runs after normal restoration and before validation:
This eliminates the class of failures where the model acknowledges the sentinel in its "thinking" but omits it from the output — a behavior observed primarily in Indic scripts with token-dense chunks.
Diagram 3.1: Sentinel Preprocessing Flow
flowchart TD
A[Source HTML Chunk] --> B[Pass 1: Extract translate=no elements]
B --> C[Pass 2: Wrap plain span text in paired sentinels]
C --> D[Pass 3: Replace bare numbers with numeric sentinels]
D --> D2[Pass 4: Replace brand terms with TERM sentinels]
D2 --> E[Send to Bedrock with sentinel-aware prompt]
E --> F[Receive translated chunk with sentinels intact]
F --> G[Deduplicate any model-doubled paired sentinels]
G --> H[Restore sentinels high-to-low index order]
H --> H2[Recovery pass: re-inject any dropped TERM sentinels]
H2 --> I[Run validators against source + restored output]
I -->|PASS| J[Accept chunk]
I -->|FAIL| K{Retries remaining?}
K -->|Yes| L[Retry with strict prompt + temperature jitter]
L --> E
K -->|No| M[Raise TranslationError]
style A fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style E fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style J fill:#064e3b,stroke:#10b981,color:#e2e8f0
style M fill:#450a0a,stroke:#ef4444,color:#e2e8f0
The four passes execute in strict order — later passes operate on the output of earlier ones. This means Type C (numeric) sentinels can protect numbers that appear inside Type B (paired) text, and Type D (brand term) sentinels protect terms that appear anywhere in the translatable content, providing defense-in-depth.
After translation, sentinels are restored in reverse index order (high → low) to prevent prefix collisions (__TBP1__ must not match inside __TBP10__).
A deduplication pass runs before restoration to handle a known model behavior: occasionally the model emits a paired sentinel twice consecutively (a bilingual output instinct). The deduplicator collapses __TBO0__text__TBC0__ __TBO0__text__TBC0__ into a single occurrence.
Every translated chunk is validated against the source before acceptance. Validators enforce structural integrity and content preservation — if a translation passes all validators, it is guaranteed to be functionally correct (code works, links resolve, diagrams render).
| Validator | Type | What It Checks | Failure Example |
|---|---|---|---|
check_protected_terms | Hard | Every protected term in source appears in output | "CloudFront" missing from Japanese output |
check_version_numbers | Hard | All version numbers (X.Y.Z) survive translation | "17.7" dropped from PostgreSQL reference |
check_preserve_patterns | Hard | URLs, emails, IPs, ARNs, resource IDs, cron expressions, memory sizes | ARN truncated or IP address reformatted |
check_tag_counts | Hard | HTML tag counts match for structural tags | Extra <span> added or <code> dropped |
check_translate_no_zones | Hard | Content inside translate="no" zones unchanged | Protected code block content altered |
Protected term matching: Short uppercase acronyms (≤4 chars like SQS, ECR, S3) use word-boundary matching to avoid false positives where the acronym appears as a substring (e.g., "ECR" inside "SECRET"). Longer terms use plain substring matching.
Implementation (v2.5): The check_tag_counts and check_translate_no_zones validators use character scanning with exact boundary matching — no regex. We control these tags. We know that a tag starts with < and ends with >. The scanner finds complete opening tags by looking for <tagname followed by a boundary character (>, space, tab, newline, or /), then reads to the closing >. This eliminates false positives from partial regex matches and is immune to edge cases where tag names appear as text content (e.g., documenting translate="no" as literal text inside a code tag).
When validation fails, the engine retries with two progressive adjustments:
Maximum retries: 3 (configurable). If all attempts fail, a TranslationError is raised with the chunk index, validator detail, and a preview of the problematic chunk.
Validators are classified into two tiers based on what they protect:
| Tier | Tags | Behavior | Rationale |
|---|---|---|---|
| Hard (content-critical) | <code>, <pre>, <a> | Retry → reject on failure | Missing code blocks, broken links, or lost pre-formatted content means the translation is functionally broken |
| Soft (decorative/structural) | <span>, <strong>, <em>, <br> | Log warning, pass through | Missing styling wrappers don't break functionality — the post-translation integrity check repairs them |
This tiered approach eliminates the failure mode where a correctly-translated document is rejected because the model dropped a single decorative <span> wrapper during RTL reordering. The content is correct — only the styling wrapper is missing — and the integrity check restores it automatically.
The ValidationReport aggregates all results and exposes:
.passed — True if zero hard failures.hard_failures — list of blocking issues (content-critical tags).soft_failures — list of warnings (decorative tags — repaired post-translation).summary() — human-readable status stringAfter translation completes and chunks are reassembled, a full-document integrity check runs before the S3 write. This is the defense-in-depth layer — it repairs structural drift that the per-chunk validator intentionally allows through (soft failures).
| Issue | Detection | Repair Action |
|---|---|---|
</br> injection | String scan for invalid closing br tags | Strip all occurrences (never valid HTML) |
<br> inside Mermaid blocks | Regex scan within <pre class="mermaid"> | Remove (breaks Mermaid syntax) |
| Mermaid content corruption | Byte-for-byte comparison with source | Flag as warning (cannot auto-repair content changes) |
Missing translate="no" span wrappers | Compare source protected elements to output | Re-wrap bare content with original element tags |
Missing <strong>/<em> wrappers | Same pattern as span recovery | Re-wrap bare content |
The integrity check only repairs translate="no" elements (where content is byte-for-byte identical between source and output). For translated content that lost its wrapper, the check logs the discrepancy but cannot reliably re-wrap (the content has been translated — matching it to the source wrapper requires semantic understanding).
Design principle: If the translated content is present and correct but the HTML structure is degraded, repair it. Only flag as unrecoverable if content is actually missing or corrupted. The customer sees a clean translation — the repairs happen invisibly.
Before any translation work begins, the source document passes through a validation gate. This catches defects that would cause translation failures or produce broken output — rejecting early saves Bedrock tokens and prevents corrupted translations from reaching S3.
| Category | What It Catches | Auto-Repairable? |
|---|---|---|
| STRUCTURAL | Unclosed tags, malformed HTML, nesting violations | Yes (up to 5 unclosed tags) |
| MERMAID | Empty diagram blocks, missing type declaration, mismatched brackets | No — reject with location |
| ENCODING | BOM markers, null bytes, mixed encodings | Yes (strip BOM/nulls) |
| SIZE | Document exceeds 500 KB, excessive nesting depth (>30 levels) | No — reject with size info |
| CONFLICT | translate="no" on root element (nothing to translate) | No — reject immediately |
<pre class="mermaid"> block has a valid diagram type, balanced brackets, and non-empty content<body> or <html> element has translate="no"The validator follows a strict philosophy: try to fix it silently, reject early if you can't. Repairable issues (unclosed tags, BOM markers) are fixed in-place — the customer never knows. Unrecoverable issues produce an actionable defect report with the exact location, what's wrong, and how to fix it.
ValidationResult:
valid: false
rejection_reason: "2 unrecoverable defects found"
defects:
- severity: error
category: MERMAID
location: "Section 5, line 342"
description: "Empty Mermaid block — no diagram content"
suggestion: "Add diagram content or remove the empty <pre class='mermaid'> block"
- severity: error
category: SIZE
location: "Document root"
description: "Document is 612 KB (limit: 500 KB)"
suggestion: "Split into multiple documents or remove large embedded assets"
Cost savings: A rejected document costs zero Bedrock tokens. Without source validation, a broken document would fail during translation (after burning tokens on partial chunks), produce a corrupted output, and require manual investigation. Source validation catches these cases in <10ms with zero API calls.
Mermaid diagrams are code — they must survive translation byte-for-byte. The integrity check (section 4.4) now includes dedicated diagram verification with automatic recovery.
The integrity check counts Mermaid blocks in the source (<pre class="mermaid">) and compares against the translated output. If any diagrams are missing from the output, the auto-stitch mechanism activates.
When a diagram is missing from the translated output:
<div class="diagram-wrap"> block from source (includes label + pre)The stitched diagram is the English version, which is functionally correct — Mermaid syntax is language-independent. The surrounding prose is already translated, so the reader gets translated explanations with a working diagram.
The _count_tags function now reports diagram count alongside other structural tags:
Tag Inventory (source → output):
• Trinity-Beast-Performance-Report.html
IN: code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4
OUT: code:75 pre:8 strong:12 em:3 a:6 br:20 diagrams:4
If a diagram is lost during translation and auto-stitched back, the final count still matches — the stitch happens before the tag inventory is calculated. A mismatch in the diagrams count after stitching indicates a structural issue that needs manual review.
Result: The Performance Report (75 KB, 4 Mermaid diagrams, 18 sections) translates to French with all 4 diagrams intact — 3 survived translation naturally, 1 was auto-stitched from source. The reader sees no difference.
The tbi-translation-orchestrator Step Function coordinates the entire translation pipeline. As of v3.0, it uses a language-persistent container pattern — one container per language, each processing all documents sequentially.
Diagram 5.1: Step Function State Machine (v3.0)
flowchart TD
A[UnwrapInput] --> AB[InitJob - tbi-translate-init]
AB --> B[PerLang Map - Parallel, Unlimited]
B --> C[tbi-translate-worker container]
C --> C2[Process ALL docs sequentially]
C2 -->|All docs done| D[Lang Container Exits]
D -->|Success| E[Lang Succeeded]
D -->|Failure| F[RecordLangFailure]
E --> G{All langs done?}
F --> G
G --> H[tbi-translate-deploy - Batch Mode]
H --> J[tbi-translate-finalize]
J --> K[Job Complete]
style A fill:#1e293b,stroke:#334155,color:#e2e8f0
style AB fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style B fill:#2e1065,stroke:#a78bfa,color:#e2e8f0
style C fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style C2 fill:#92400e,stroke:#fbbf24,color:#e2e8f0
style H fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style J fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
style K fill:#064e3b,stroke:#10b981,color:#e2e8f0
The v3.0 architecture inverts the execution model. Instead of launching N×M containers (one per doc-language pair), it launches M containers (one per language). Each container receives the full list of documents as a JSON array and processes them sequentially before exiting.
| Job | Old (v2.x) Containers | New (v3.0) Containers | Reduction |
|---|---|---|---|
| 3 docs × 11 langs | 33 | 11 | 67% |
| 30 docs × 11 langs (full library) | 330 | 11 | 97% |
| 1 doc × 11 langs | 11 | 11 | 0% (unchanged) |
| 6 docs × 3 langs | 18 | 3 | 83% |
MaxConcurrency=0 (unlimited) on the PerLang Map. All language containers launch simultaneously. ECS Fargate handles scheduling.The Step Function uses States.JsonToString (an intrinsic function) to serialize the docs array into a string environment variable for the ECS container. The worker's task_runner.py deserializes it on startup and iterates through each document.
EventBridge Pipes always wrap SQS records in an array, even with batch size 1. The UnwrapInput Pass state uses InputPath: "$[0]" to extract the single job envelope from the array wrapper.
Replaced the original Pass state with the tbi-translate-init Lambda. This state records the Step Function execution ARN ($$.Execution.Id) back to Valkey via POST /admin/translate/update/{job_id} and transitions the job state from queued → running.
Catch block — if InitJob fails, the pipeline still continues (InitJob failure is non-fatal)| Failure Mode | Handling | Job State |
|---|---|---|
| Single language fails after 3 retries | Catch → RecordLangFailure pass state, continue other langs | partial |
| All languages for a doc fail | Deploy Lambda receives empty succeeded list, skips invalidation | partial |
| Worker timeout (no response) | ECS task runs to completion — no timeout ceiling. Step Function waits via ecs:runTask.sync | running |
| Step Function execution exception | Finalize still runs via catch-all; job marked failed | failed |
| Operator cancels mid-flight | StopExecution API call; job marked cancelled | cancelled |
| Step Function fails before Finalize | Self-healing sweeper detects orphaned job via execution ARN, marks as failed | failed |
Per-lang independence: Failure of one (doc, lang) pair never aborts work on the other 10 languages. This is enforced by the Step Function's Catch on the inner Map iterator — errors are captured as data, not propagated as exceptions.
The tbi-translate-pipe connects SQS to the Step Function without a glue Lambda:
trinity-beast-translation-queuetbi-translation-orchestratorInputTemplate extracts body fields using <$.body.field> syntax (implicit JSON parsing of SQS body)tbi-translate-pipe-role with sqs:ReceiveMessage, sqs:DeleteMessage, states:StartExecutionThis is the AWS-native pattern for SQS-to-Step-Function integration — no code, no cold start, built-in error handling.
The sweeper runs automatically on every GET /admin/translate/health call (piggybacked) and is also available as a dedicated POST /admin/translate/sweep endpoint.
It scans all jobs in tx:active (the Valkey SET of active job IDs). For each job older than 15 minutes in queued or running state:
FAILED, TIMED_OUT, or ABORTED → marks job as failed, removes from tx:active, updates Aurora with reasonfailedRUNNING → leaves it aloneAll sweep actions are logged to translation_job_events for audit trail.
Result: This eliminates the stuck queue problem permanently — no manual cleanup needed. Jobs that silently fail are automatically detected and marked, keeping the active set accurate and the queue healthy.
The job state now reflects the exact phase of execution:
| Phase | Meaning |
|---|---|
queued | Submitted to SQS, waiting for EventBridge Pipe to trigger Step Function |
running | InitJob Lambda fired, Step Function execution ARN recorded, worker translating |
deploying | All translations complete, deploy Lambda creating CloudFront invalidations |
finalizing | Deploy complete, finalize Lambda rebuilding search index and writing final state |
succeeded / partial / failed | Terminal states — all sub-tasks complete, email notification sent |
This gives real-time visibility into exactly where a job is in the pipeline.
All endpoints require the X-Admin-Key header. They are served by the LPO server (Go) alongside the existing admin routes.
POST /admin/translateSubmits a new translation job. Validates inputs, checks cost limits, creates job state in Valkey (synchronous) and Aurora (async goroutine), enqueues to SQS.
// Request
POST /admin/translate
X-Admin-Key: tbcc-admin-...
X-Idempotency-Key: my-unique-key (optional)
Content-Type: application/json
{
"docs": ["Trinity-Beast-API-Reference.html", "Trinity-Beast-Architecture-Guide.html"],
"langs": "all",
"options": {
"force": false,
"delta": false,
"skip_search_rebuild": false,
"skip_validation": false
}
}
// Response 200
{
"status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate] [200]",
"status_code": 200,
"endpoint": "/admin/translate",
"cluster_node": "BeastMain",
"region": "us-east-2",
"language": "en",
"timestamp": "2026-05-16T16:42:00Z",
"data": {
"job_id": "1747407720-a3f8b2c1d4e5",
"state": "queued",
"submitted_at": "2026-05-16T16:42:00Z"
},
"error": ""
}
Validation rules:
docs — required, 1-6 entries, each must be a valid filename in S3langs — "all" (expands to all 11) or an array of 1-11 valid language codesoptions.force — bypass known-failure guard and difficulty rejectionoptions.delta — skip pairs where the translated file is already newer than the source (saves up to 90% on re-translation)autoops:bedrock:spend:daily)autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily)GET /admin/translate/status/{job_id}Returns the full job state. Aurora is the primary source — state, timestamps, docs, langs, cost, and Step Function ARN are read from translation_jobs. Real-time per-doc/lang progress is overlaid from Valkey (written per-pair by the worker, too frequent for Aurora writes). If Aurora doesn't have the job yet (async insert still pending), falls back to Valkey.
GET /admin/translate/queueLists all pending and active jobs (state in queued or running).
GET /admin/translate/historyReturns the last 50 completed jobs from translation_jobs in Aurora, ordered by submission date descending. Includes state, docs, succeeded/failed pair counts, cost, and reason. Falls back to the Valkey tx:history list if Aurora is unavailable.
GET /admin/translate/healthSystem health overview:
{
"status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/health] [200]",
"status_code": 200,
"endpoint": "/admin/translate/health",
"cluster_node": "BeastMain",
"region": "us-east-2",
"language": "en",
"timestamp": "2026-05-16T17:30:00Z",
"data": {
"queue_depth": 0,
"active_jobs": 1,
"last_completed_at": "2026-05-16T17:30:00Z",
"last_state": "succeeded",
"daily_spend_usd": "12.40",
"daily_spend_limit_usd": "600.00",
"daily_input_tokens": 284150,
"daily_output_tokens": 312480,
"daily_token_limit": 50000000,
"swept_jobs": 0
},
"error": ""
}
POST /admin/translate/cancel/{job_id}Stops the Step Function execution via StopExecution API. Marks job as cancelled. Returns 409 if already in a terminal state.
POST /admin/translate/retry-failed/{job_id}Creates a new job from the failed (doc, lang) pairs of a completed-with-partial job. Returns 409 if the original is still running.
POST /admin/translate/sweepManually triggers the self-healing sweeper. Idempotent — safe to call repeatedly.
// Response 200
{
"status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/sweep] [200]",
"status_code": 200,
"endpoint": "/admin/translate/sweep",
"cluster_node": "BeastMain",
"region": "us-east-2",
"language": "en",
"timestamp": "2026-05-16T18:00:00Z",
"data": {
"swept": 2,
"checked": 5,
"results": [
{
"job_id": "1747407720-a3f8b2c1d4e5",
"prior_state": "running",
"submitted_at": "2026-05-16T16:42:00Z",
"sfn_status": "FAILED",
"action": "marked_failed"
}
]
},
"error": ""
}
These endpoints are called by the worker task and finalize Lambdas to update Aurora without needing direct database access (worker and Lambdas are outside the VPC).
POST /admin/translate/update/{job_id}Updates job state, progress, cost, and timing fields. Called by worker task after each (doc, lang) translation and by finalize Lambda on completion.
POST /admin/translate/event/{job_id}Records a granular event in the translation_job_events table. Used for audit trail — each doc/lang start, success, failure, retry is logged as a separate event.
Fire-and-forget pattern: Both callback endpoints always return 200 regardless of Aurora write outcome. The translation pipeline must never fail because observability data couldn't be written. Errors are logged but never propagated.
Aurora is the authoritative record for all translation job state. Valkey serves one specific role: real-time per-pair progress updates during active execution (written too frequently for Aurora). For everything else — job state, history, cost, audit trail — Aurora is read first.
Design principle: Valkey is the price cache, search indexes, and real-time counters. It is not a job ledger. Aurora is the ledger. When you need to know what was translated, when, at what cost, and with what result — query Aurora.
One row per job submission. 28 columns covering the full lifecycle. This table is the ground truth for gap analysis, cost reporting, and audit:
| Column Group | Fields | Purpose |
|---|---|---|
| Identity | id, job_id, idempotency_key | Unique identification and deduplication |
| State | state, submitted_at, started_at, completed_at | Lifecycle tracking — authoritative terminal state |
| Input | docs (JSONB), langs (JSONB), options (JSONB) | What was requested |
| Progress | total_pairs, succeeded_pairs, failed_pairs, progress (JSONB) | Per-doc/lang status map |
| Cost | bedrock_cost_usd, bedrock_invocations | Spend tracking per job |
| Execution | step_function_arn, errors (JSONB), elapsed_seconds | Traceability and debugging |
| Deployment | cloudfront_invalidation_ids, search_index_rebuilt, notification_sent | Post-translation actions |
| Lineage | retry_of, reason | Retry chain and submission reason |
| Metadata | submitted_by, created_at, updated_at | Audit trail |
Gap analysis query: To find which documents have never been translated, query SELECT DISTINCT jsonb_array_elements_text(docs) FROM translation_jobs ORDER BY 1 and compare against the S3 document list. Aurora is the only reliable source for this — Valkey keys expire and don't persist across cache flushes.
Granular audit log — one row per significant event in a job's lifecycle. Used by the retry-failed handler as the authoritative source of which (doc, lang) pairs failed:
| Column | Type | Example Values |
|---|---|---|
job_id | VARCHAR | 1747407720-a3f8b2c1d4e5 |
event_type | VARCHAR | lang_started, lang_succeeded, lang_failed, deploy_started, finalize_complete |
doc | VARCHAR | Trinity-Beast-API-Reference.html |
lang | VARCHAR | ja, ar, es |
detail | JSONB | Cost, chunk count, error message, validator report |
created_at | TIMESTAMP | Event timestamp |
The translation system uses a deliberate split between Aurora and Valkey based on access pattern:
| Data | Primary Store | Reason |
|---|---|---|
| Job state (queued/running/succeeded/failed) | Aurora | Authoritative terminal state — never expires, queryable, auditable |
| Job history (last 50 completed) | Aurora | Permanent record — survives cache flushes, supports gap analysis |
| Per-pair progress (es: succeeded, ja: running…) | Valkey | Written per-pair during execution — too frequent for Aurora writes, only needed during active polling |
| Daily spend counter | Valkey | Needs atomic INCRBYFLOAT and 24h TTL auto-reset — Aurora is wrong tool for this |
| Active job set | Valkey | Fast set membership check on every submit — Aurora query would add latency to the hot path |
go func() goroutine (non-blocking — API response returns without waiting)POST /admin/translate/update/{job_id} with terminal state — Aurora is updated, Valkey is updated, job removed from active set.translation_job_events for lang_failed events — Aurora is the only reliable source for which pairs failed.Do not rely on Valkey for job state. Valkey keys have no TTL on job hashes and can be flushed, evicted under memory pressure, or simply stale if the finalize Lambda's update call was lost. Aurora is the record of what happened. Valkey is the window into what is happening right now.
The translation engine calls Bedrock (Claude Sonnet 4.6) for every chunk of every document in every language. Without guardrails, a single typo in a batch submission could trigger hundreds of expensive API calls.
| Layer | Where | Limit | Behavior on Breach |
|---|---|---|---|
| Per-request limits | Admin API (submit handler) | Max 6 docs, max 12 langs, max 3 active jobs | 400 Bad Request (docs/langs) or queue in SQS (active jobs) |
| Daily dollar cap | Admin API (submit handler) | $600/day (autoops:bedrock:spend:daily) | 429 Too Many Requests until counter expires |
| Daily token cap | Admin API (submit handler) | 50M combined tokens/day (autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily) | 429 Too Many Requests until counters expire |
| Per-invocation tracking | Worker task | Increments after every Bedrock call | Source of truth for daily counters |
Two parallel counters track daily usage — a dollar cap and a token cap. Both live in Valkey with 24-hour TTL auto-reset and are checked on every job submission.
autoops:bedrock:spend:daily)INCRBYFLOAT after every Bedrock invocation in the worker taskEXPIRE autoops:bedrock:spend:daily 86400 after each incrementWhy $600? A full batch translation of the entire 40-document library × 11 languages costs approximately $726 in raw Bedrock spend at ~$1.65 per doc-language pair (Sonnet 4.6) — but in practice the library is never re-translated all at once. Typical batches are 3 or 6 documents (per the Trinity Beast multiples-of-3 convention) and run well under $200. The $600 cap is a daily safety guardrail with comfortable headroom for several batches plus normal AutoOps overhead (threat analysis, digests, support) in the same 24-hour window.
autoops:bedrock:tokens:input:daily + autoops:bedrock:tokens:output:daily)INCRBY after every Bedrock invocation — separate keys for input and output tokensKill switch: Setting autoops:bedrock:kill = "1" in Valkey causes both the submit endpoint and the worker task to refuse all operations. Use this for emergency cost containment.
Pricing formula:
Token rates (stored in Valkey, always current):
| Agent | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Haiku 3.5 | $1.00 | $5.00 |
| Sonnet 4.6 | $3.00 | $15.00 |
| Opus 4 | $5.00 | $25.00 |
Typical costs (Bedrock spend, before markup):
| Scenario | Bedrock Cost | Customer Price |
|---|---|---|
| 1 document × 1 language (Haiku 3.5) | ~$0.12 | ~$0.17 |
| 1 document × 1 language (Sonnet 4.6) | ~$1.65 | ~$2.34 |
| 1 document × 11 languages (Sonnet 4.6) | ~$18 | ~$25.50 |
| 6 documents × 11 languages (1 batch job, Trinity Beast convention) | ~$108 | ~$153 |
| 30 documents × 11 languages (full library, Sonnet) | ~$540 | ~$770 |
Translation engine metrics are exposed through two public interfaces:
GET /public/infrastructure — includes a translation section with daily spend, daily limit, active jobs, queue depth, cost-per-pair estimate, and daily token counts (daily_input_tokens, daily_output_tokens, daily_token_limit). Consumed by the daily digest Lambda, nightly sync, and any monitoring dashboard./public/infrastructure every 30 seconds.Email notification timing: The email notification is the absolute LAST step in the pipeline. It fires only after: translation, deployment, search index rebuild, state update, and history push are ALL complete. The email is a comprehensive report including: job summary, translation results, CloudFront invalidation IDs, search index status, and any Bedrock error details. If Bedrock reports validation failures, the specific error messages and validator details are included in the email.
The existing CLI tool (scripts/kcc_helpers/translate_doc.py) continues to work unchanged. A --remote flag routes through the new service instead of running Bedrock locally:
| Flag | Behavior | Use Case |
|---|---|---|
--local (current default) | Runs translator engine in-process, calls Bedrock directly from laptop | Development, debugging, single-doc quick fixes |
--remote | POSTs to /admin/translate, polls /admin/translate/status/{id} every 5s, streams progress to stdout | Production translations, batch operations |
The --remote flag produces identical terminal output to local mode — same progress bars, same chunk counters, same completion summary. The operator's workflow doesn't change; only the execution path does.
Default flip plan: Start with --local as default to avoid surprising anyone. After 30 days of clean production runs through the service, flip the default to --remote and add --local as the explicit fallback.
All translation behavior is driven by a single config file: scripts/translation-config.json. This is the shared source of truth consumed by both the Python engine and the Go admin API.
Brand names, product names, AWS services, exchange names, and acronyms that must never be translated or transliterated:
Cross Power Ministries of Pakistan, The Trinity Beast Infrastructure,
The Trinity Beast, Trinity Beast Command Center, Kiro Command Center,
Cory Dean Kalani, Shafiq Bhatti, BeastWebhook, BeastMirror, BeastMain,
BeastLRS, Claude Sonnet 4.6, Bedrock, ElastiCache, EventBridge,
CloudFront, GuardDuty, CloudWatch, CloudTrail, Step Functions,
Crypto.com, Coinbase, Gate.io, Gemini, Kraken, Aurora, Valkey,
Stripe, Kiro, Fargate, PostgreSQL, Lambda, Route 53, AutoOps,
TBCC, CPMP, TBI, KCC, OKX, ECR, ECS, ALB, NLB, WAF, SNS, SQS,
SES, VPC, IAM, S3 ...
In addition to the global protected terms list, you can submit document-specific terms via the protected_terms array in the translation request. This is useful for:
POST /admin/translate
{
"docs": ["Trinity-Beast-API-Reference.html"],
"langs": "all",
"protected_terms": ["MyCustomService", "SpecialEndpoint", "ProjectAlpha"]
}
Per-request terms are merged with the global list for that job only. They do not persist across jobs.
Regex patterns for technical tokens that must survive translation unchanged:
| Pattern Name | Matches | Example |
|---|---|---|
url | HTTP/HTTPS URLs | https://api.cpmp-site.org/admin/translate |
email | Email addresses | CoryDeanKalani@CPMP-Site.org |
memory_size | Number + memory unit | 1770 MB, 32 GB |
percentage | Number + % | 98.5%, 62% |
cron_expr | Cron expressions | cron(0 11 * * ? *) |
ip_address | IPv4 with optional CIDR | 10.0.1.0/24 |
aws_arn | AWS ARN format | arn:aws:sns:us-east-2:211998422884:tbi-ops-notifications |
aws_resource_id | AWS resource identifiers | vpc-03deaddb7083cd59c, sg-050b617f93b2388f6 |
| Parameter | Value | Purpose |
|---|---|---|
max_chunk_chars | 6000 | Default maximum characters per chunk (Latin scripts: es, pt, fr, de) |
max_chunk_chars_by_lang | See below | Per-language overrides for complex scripts |
max_retries | 3 | Retry attempts per chunk on validation failure |
request_timeout_seconds | 300 | Per-Bedrock-call timeout (5 minutes — large RTL chunks need headroom) |
max_output_tokens | 8192 | Maximum tokens in Bedrock response |
Per-language chunk size overrides:
| Languages | Chunk Size | Rationale |
|---|---|---|
hi, ur, ar | 3000 chars | Devanagari and Arabic scripts expand significantly during translation. Smaller chunks prevent Bedrock timeouts. |
ja, zh, ru | 4500 chars | CJK and Cyrillic have moderate expansion. Mid-range chunks balance throughput and reliability. |
es, pt, fr, de, it | 6000 chars (default) | Latin scripts translate quickly with minimal expansion. |
Single document, all languages:
curl -s -X POST \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
https://api.cpmp-site.org/admin/translate | jq .
Multiple documents, specific languages:
curl -s -X POST \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"docs":["Trinity-Beast-API-Reference.html","Trinity-Beast-Architecture-Guide.html"],"langs":["es","pt","fr"]}' \
https://api.cpmp-site.org/admin/translate | jq .
With idempotency key (safe to retry):
curl -s -X POST \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "X-Idempotency-Key: api-ref-2026-05-16" \
-H "Content-Type: application/json" \
-d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all"}' \
https://api.cpmp-site.org/admin/translate | jq .
# Check job status
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
https://api.cpmp-site.org/admin/translate/status/{job_id} | jq .
# View queue
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
https://api.cpmp-site.org/admin/translate/queue | jq .
# System health
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
https://api.cpmp-site.org/admin/translate/health | jq .
# Recent history
curl -s -H "X-Admin-Key: $ADMIN_KEY" \
https://api.cpmp-site.org/admin/translate/history | jq .
| Symptom | Cause | Resolution |
|---|---|---|
Job stuck in queued | EventBridge Pipe not consuming | Check Pipe status in console; verify IAM role |
| 429 on submit | Daily spend cap hit ($600) | Wait for 24h TTL expiry, or reset manually: SET autoops:bedrock:spend:daily 0 |
| Partial completion | Some languages failed validation | POST /admin/translate/retry-failed/{id} |
| Worker timeout | Document too large (many chunks) | Check Step Function execution history for the failing chunk index |
| Cancel returns 404 | Job only in Aurora, not Valkey | Cancel handler falls back to Aurora — ensure latest code is deployed |
| No email notification | Finalize Lambda error | Check CloudWatch logs for tbi-translate-finalize |
| Search not updated | Search rebuild timed out | Run bash scripts/kcc.sh build-search manually |
Cancel a running job:
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" \
https://api.cpmp-site.org/admin/translate/cancel/{job_id}
This stops the Step Function execution immediately. Documents already translated and deployed remain live. The search index is rebuilt for whatever landed successfully.
The translation engine implements automatic regional failover to maintain availability during Bedrock service disruptions. This was added after a us-east-2 outage during development exposed the single-region weakness — better to discover this before customers were affected.
When a Bedrock call fails with a service-level error, the engine automatically retries in the next region:
| Priority | Region | Location | Role |
|---|---|---|---|
| 1 | us-east-2 | Ohio | Primary — all normal traffic |
| 2 | us-east-1 | N. Virginia | First fallback |
| 3 | us-west-2 | Oregon | Second fallback |
The failover is transparent to the caller — the translation completes successfully as long as at least one region is available. A log message records when a fallback region was used.
Failover is triggered for service-level errors and timeouts that indicate the region is unavailable or overloaded:
| Error Type | Meaning | Action |
|---|---|---|
ServiceUnavailableException | Bedrock service is down (503) | Retry same region once, then failover |
ThrottlingException | Rate limit or capacity exceeded | Retry same region once, then failover |
ModelStreamErrorException | Model streaming failure | Retry same region once, then failover |
ReadTimeoutError | Response took longer than 300s | Retry same region once, then failover |
ConnectTimeoutError | Could not establish connection within 10s | Retry same region once, then failover |
Other errors (validation failures, authentication errors, malformed requests) are not retried — they would fail identically everywhere.
Each region gets 2 attempts before the engine moves to the next region. A 5-second backoff between attempts allows transient pressure to clear:
us-east-2 (attempt 1) → timeout → wait 5s →
us-east-2 (attempt 2) → timeout →
us-east-1 (attempt 1) → timeout → wait 5s →
us-east-1 (attempt 2) → timeout →
us-west-2 (attempt 1) → timeout → wait 5s →
us-west-2 (attempt 2) → timeout → FAIL (raise exception)
Total: 6 attempts across 3 regions. In practice, transient spikes clear within 5-10 seconds, so the retry within the same region usually succeeds without needing failover.
Regional failover has negligible cost impact:
Resilience benefit: A complete regional outage no longer blocks translations. The May 2026 us-east-2 outage would have caused a 4-hour translation blackout without this feature. With failover, translations continued uninterrupted via us-east-1.
Proper document preparation ensures clean translations with minimal post-processing. This section covers the conventions that help the translation engine produce accurate results.
The <code translate="no"> tag tells the translation engine to preserve content exactly as written. Use it correctly to avoid formatting artifacts in translated documents.
Use <code translate="no"> for technical identifiers that would break if translated:
tbi-ops-notify, BeastMain/admin/translate, /public/infrastructureAWS_REGION, ADMIN_KEY_apply_sentinels(), translate()/var/log/app.log, scripts/kcc.shjob_id, created_atmax_retries, request_timeout_seconds--remote, --forceDo not wrap pure data values in code tags — they should appear as plain text:
<code>1770 MB</code>)<code>60 seconds</code>)<code>98.5%</code>)<code>11 languages</code>)<code>$600</code>)<code>17.7</code>)Why this matters: The translation engine's sentinel system protects code-tagged content from translation. If you wrap "32 GB" in code tags, it survives translation — but so does the monospace formatting, which looks wrong in prose. The engine has a post-processor that strips spurious code wrappers from pure numeric values, but it's better to author correctly from the start.
Ask yourself: "If I changed this value, would the system break?" If yes, use code tags. If no (it's just a number or measurement), leave it as plain text.
| Content | Would changing it break something? | Use code tags? |
|---|---|---|
tbi-ops-notify | Yes — Lambda name | ✅ Yes |
| 1770 MB | No — just a memory size | ❌ No |
/admin/translate | Yes — API endpoint | ✅ Yes |
| $600 | No — just a dollar amount | ❌ No |
max_retries | Yes — config key | ✅ Yes |
| 3 retries | No — just a count | ❌ No |
For documents with domain-specific terminology not in the global protected terms list, submit additional terms with the translation request:
POST /admin/translate
{
"docs": ["Customer-Integration-Guide.html"],
"langs": "all",
"protected_terms": [
"CustomerCorp",
"ProjectPhoenix",
"DataSync API",
"IntegrationHub"
]
}
These terms are added to the global list for this job only. The engine will:
<span translate="no"> during preprocessingWhen the translation engine encounters ambiguous content, it may flag it for human review. This happens in the validation phase when:
Flagged content appears in the job status response under the warnings array:
{
"status": "✅ [LPO] [us-east-2] [BeastMain] [/admin/translate/status/1747407720-a3f8b2c1d4e5] [200]",
"status_code": 200,
"endpoint": "/admin/translate/status/1747407720-a3f8b2c1d4e5",
"cluster_node": "BeastMain",
"region": "us-east-2",
"language": "en",
"timestamp": "2026-05-16T17:45:00Z",
"data": {
"job_id": "1747407720-a3f8b2c1d4e5",
"state": "succeeded",
"warnings": [
"chunk 14 (ja): soft failure — protected term 'DataSync' may have been altered",
"chunk 22 (ar): soft failure — version number format changed from X.Y.Z to X.Y"
]
},
"error": ""
}
Soft failures don't block the translation — the output is still deployed. Review the warnings and manually verify the flagged sections if needed.
Feedback loop: If you consistently see the same term flagged, add it to the global protected terms list in scripts/translation-config.json. This prevents future warnings and improves translation quality across all documents.
Before translation begins, the engine analyzes each document for complexity factors that may cause validation failures. This pre-scan identifies code-heavy sections and recommends whether to proceed, exercise caution, or split the document.
The pre-scan calculates a complexity score for each section based on:
| Factor | Weight | Why It Matters |
|---|---|---|
| Code tags | 1.0 per tag | Each code tag must survive translation intact — more tags = more validation points |
| Code tags in tables | 1.5 per tag | Tables with code examples are harder — model tends to merge or drop tags when reordering |
| Tables | 2.0 per table | Tables with technical content require careful structure preservation |
| Pre blocks | 0.5 per block | Usually have translate="no" — lower risk but still tracked |
| Protected spans | 0.3 per span | Handled by sentinel system — low risk |
Based on the analysis, the pre-scan returns one of three recommendations:
| Recommendation | Criteria | Action |
|---|---|---|
| PROCEED | Score < 20, no high-density sections | Translate normally — low failure risk |
| CAUTION | Score < 50, ≤ 2 high-density sections | Proceed but monitor — may need retries |
| SPLIT | Score ≥ 50 OR > 3 high-density sections | Consider splitting document before translation |
DOCUMENT TRANSLATION COMPLEXITY ANALYSIS
========================================
Total characters: 81,107
Total sections: 13
Total code tags: 287
Overall complexity score: 415.4
Recommendation: SPLIT
WARNINGS:
⚠️ Document has 287 code tags — high validation failure risk
⚠️ Section 'step-function' has 51 code tags — consider simplifying
⚠️ Section 'observability' has 48 code tags — consider simplifying
HIGH-DENSITY SECTIONS (9):
• architecture: 11 code tags, score 22.7
• sentinel-system: 22 code tags, score 34.5
• step-function: 51 code tags, score 71.1
...
SUGGESTED SPLIT: 4 parts
→ Split after 'validators' (After 3 high-density sections)
→ Split after 'observability' (After 3 high-density sections)
→ Split after 'doc-prep' (After 3 high-density sections)
When the pre-scan recommends splitting, it suggests natural break points at section boundaries. Options for handling complex documents:
Create separate HTML files for each part (e.g., Doc-Part1.html, Doc-Part2.html). Each part translates independently with lower failure risk. Link them together with navigation.
Reduce code tag density in problematic sections:
Submit fewer languages per job (e.g., 3 instead of 11). This reduces concurrent load and allows the model more capacity per translation. Retry failed languages individually.
Complex scripts (Urdu, Arabic, Hindi) struggle with high tag density even when Latin-script languages handle the same chunk fine. The prescan now applies per-language code tag limits — tighter thresholds for scripts where the model is more likely to drop markup:
| Language | Script | Max Code Tags per Part |
|---|---|---|
| Default (Latin, CJK, Cyrillic) | Latin / Kanji / Cyrillic | 30 |
| Urdu (ur) | Nastaliq | 18 |
| Arabic (ar) | Arabic | 18 |
| Hindi (hi) | Devanagari | 20 |
Configuration key: max_code_tags_per_part_by_lang in translation-config.json. When the prescan runs for a specific language, it uses that language's threshold to determine split points. A document that translates as one part for Spanish may automatically split into 2-3 parts for Urdu.
Result: The Translation Service document (22 code tags in the Architecture section) previously failed for Urdu on every attempt. With the per-language threshold of 18, the prescan splits Architecture and Observability into separate parts. All 11 languages now translate successfully.
This document is an edge case: The Translation Engine documentation itself has 287 code tags and a complexity score of 415 — it's documentation about a translation engine, so it's packed with code examples. Most documents score under 50.
Even when code tag density is low, a single part that exceeds the model's effective output window will be silently truncated — sections at the end of the part simply disappear from the output. The safety valve enforces a hard character limit per part regardless of prescan recommendations.
The Performance Report (75 KB) has 18 sections with moderate code density. The prescan recommended splitting into 3 parts based on code tag thresholds. But Part 1 was 36 KB of prose-heavy content — well under the code tag limit but far beyond the model's output token budget. The model translated the first ~24 KB faithfully, then its output simply stopped. Sections 7-8 (partner-sustained, udp-engine) vanished without any error signal.
# Safety valve: max chars per part (prevents model output truncation)
MAX_CHARS_PER_PART = 24000 # ~6000 tokens, well within max_output_tokens
The splitter now enforces a 24 KB ceiling on every part. If a part exceeds this limit after the prescan-based split, it is further subdivided at the nearest section boundary. This is conservative — Latin scripts could handle ~30 KB, but 24 KB is safe for all languages including RTL and CJK where token efficiency is lower.
| Document | Before (v2.6) | After (v2.8) |
|---|---|---|
| Performance Report (75 KB) | 3 parts (Part 1: 36 KB — truncated) | 4 parts (largest: 22 KB — clean) |
| API Reference (180 KB) | 8 parts (all under 24 KB already) | 8 parts (no change — already safe) |
| Translation Engine (116 KB) | 11 parts (code-density driven) | 11 parts (no change — code splits dominate) |
The safety valve only activates when the prescan's code-tag-based splitting produces oversized parts. For most documents, the code density split already keeps parts well under 24 KB.
Result: Performance Report went from dropping 3 entire sections (silent truncation) to a perfect 18/18 sections, 4/4 diagrams, 20/20 <br/> tags across all 11 languages.
The document-level preprocessor is a critical layer that runs before chunking. It extracts complex HTML elements from the entire document, replacing them with simple Unicode placeholders. After translation, the postprocessor restores the original elements. This eliminates the "model drops tags" failure mode entirely.
The per-chunk sentinel system (Section 3) works well for most documents, but complex documents with many <code>, <strong>, and <em> tags exposed a fundamental limitation:
__TBP0__, __TBP1__, etc.)Example failure: A chunk with 27 <code translate="no"> tags consistently failed validation with tag count mismatch (27→23) — the model dropped 4 placeholders despite explicit instructions to preserve them.
Extract ALL problematic elements from the entire document before chunking. The model never sees these elements — only simple Unicode placeholders that it cannot confuse with HTML structure.
Key insight: The model cannot corrupt what it never sees. By extracting elements at the document level, each chunk has zero complex tags to worry about. The model translates clean prose with obvious markers.
| Pipeline Stage | Before (v2.2) | After (v2.3) |
|---|---|---|
| Document received | 290 code tags | 290 code tags |
| After preprocessing | — | 0 code tags (290 placeholders) |
| Per-chunk sentinels | 20+ placeholders per chunk | 0-2 placeholders per chunk |
| Model cognitive load | High (complex structure) | Low (clean prose) |
| Validation failures | Frequent on complex docs | Rare |
The preprocessor integrates into the translation pipeline as the first step:
Document → PREPROCESS → Chunk → Translate → Reassemble → POSTPROCESS → Output
↓ ↓
Extract ALL code/pre/strong/em tags Restore placeholders
Replace with ⟦CODE_001⟧, ⟦STRONG_002⟧ with original HTML
Build manifest mapping from manifest
engine.pydef translate(text, target_lang, mode="html", ...):
# Step 1: PREPROCESS — Extract elements (document-level)
simplified_html, manifest = preprocess_for_translation(text)
# Step 2: CHUNK — Split simplified document (zero complex tags now)
head, chunks, tail = chunker.split_document(simplified_html, lang=target_lang)
# Step 3: TRANSLATE — Each chunk through Bedrock
for chunk in chunks:
translated = _translate_chunk(chunk, ...) # Per-chunk sentinels still run
# Step 4: REASSEMBLE
reassembled = chunker.reassemble(head, translated_chunks, tail)
# Step 5: POSTPROCESS — Restore placeholders with original elements
output = postprocess_translation(reassembled, manifest)
The preprocessor extracts elements in order of specificity (most specific first) to handle nesting correctly:
| Pass | Elements Extracted | Placeholder Format |
|---|---|---|
| 1 | <pre translate="no"> blocks | ⟦PRE_001⟧ |
| 2 | <code translate="no"> tags | ⟦CODE_001⟧ |
| 3 | Other translate="no" elements | ⟦SPAN_001⟧ |
| 4 | <strong>, <em>, <b>, <i> tags | ⟦STRONG_001⟧, ⟦EM_001⟧ |
| 5 | Numeric patterns (memory sizes, percentages, versions) | ⟦MEM_001⟧, ⟦PCT_001⟧, ⟦VER_001⟧ |
Placeholders use Unicode brackets (⟦ and ⟧) that will never appear in real HTML content:
⟦TYPE_NNN⟧ (e.g., ⟦CODE_042⟧)The preprocessor handles arbitrary nesting depth by processing innermost elements first:
Source:
<span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>
Pass 1: Extract inner code tag
<span translate="no">⟦CODE_001⟧ Lambda</span>
Pass 2: Extract outer span
⟦SPAN_002⟧
Model sees: ⟦SPAN_002⟧ (one token, no nesting)
When the preprocessor extracts elements from a container (e.g., a table cell), earlier passes leave placeholder text in the parent. Later passes must not be confused by these sibling placeholders — a <code translate="no"> tag in the same table cell as an already-extracted element is still a valid extraction target.
Bug fixed (v2.4): The original _is_inside_placeholder check walked up the DOM tree looking for the ⟦ character in any parent's text. This caused false positives — if a sibling element had been extracted (leaving ⟦CODE_042⟧ in the parent's text), the check incorrectly skipped remaining <code translate="no"> tags in the same container. Those unextracted tags then overwhelmed the model during complex-script translation (Hindi, Urdu). Fix: the check now always returns false — if an element still exists in the DOM tree, it wasn't extracted and is a valid target.
After translation, the postprocessor restores placeholders in reverse index order (high → low) to prevent prefix collisions:
Translated: ⟦SPAN_002⟧
Restore ⟦SPAN_002⟧:
<span translate="no">⟦CODE_001⟧ Lambda</span>
Restore ⟦CODE_001⟧:
<span translate="no"><code translate="no">tbi-ops-notify</code> Lambda</span>
Perfect reconstruction — model never had to understand nesting.
The manifest maps each placeholder to its original HTML, enabling exact restoration:
{
"⟦CODE_001⟧": {
"type": "CODE",
"html": "<code translate=\"no\">tbi-ops-notify</code>",
"index": 1
},
"⟦SPAN_002⟧": {
"type": "SPAN",
"html": "<span translate=\"no\">⟦CODE_001⟧ Lambda</span>",
"index": 2
}
}
Result: The Translation Engine document (290 code tags, complexity 423) now translates with 0 retries across all 11 parts. Previously it failed consistently on Part 8 (config section with 27 code tags).
Pass 5 extracts numeric patterns from the text after HTML element extraction. This protects bare numbers in prose that weren't already inside code or span tags. The model cannot convert, localize, or drop what it never sees.
When translating to complex scripts (Arabic, Hindi, Urdu), the model occasionally:
These transformations break technical accuracy. The numeric extraction pass prevents all of them.
| Pattern Type | Regex | Examples | Placeholder |
|---|---|---|---|
| Memory sizes | \d+(?:\.\d+)?\s?(?:GB|MB|KB|TB) | 32 GB, 1770 MB, 256 KB | ⟦MEM_001⟧ |
| Percentages | \d+(?:\.\d+)?% | 98.5%, 62%, 100% | ⟦PCT_001⟧ |
| Version numbers | \d+\.\d+(?:\.\d+)? | 4.6, 17.7, 2.3.1 | ⟦VER_001⟧ |
Numeric extraction runs after HTML element extraction (Passes 1-4). This means:
<code> tags are already protected by Pass 2translate="no" spans are already protected by Pass 3Source:
"The Lambda uses 1770 MB of memory and achieves 98.5% uptime."
After Pass 5:
"The Lambda uses ⟦MEM_042⟧ of memory and achieves ⟦PCT_043⟧ uptime."
Model translates prose, placeholders survive intact.
After restoration:
"लैम्ब्डा 1770 MB मेमोरी का उपयोग करता है और 98.5% अपटाइम प्राप्त करता है।"
Technical values preserved exactly — no localization, no conversion.
Result: Translation failures caused by numeric value loss (preserve_memory_size: missing: GB, MB) are now resolved across all 11 languages. Numeric values survive intact regardless of target script.
The numeric extraction pass includes safeguards to prevent extracting numbers that are part of existing placeholder names (e.g., the "001" in ⟦CODE_001⟧):
⟧Without these guards, the numeric regex would corrupt placeholder names by extracting their index numbers, producing nested placeholders like ⟦CODE___TBN10__⟧ that the model cannot handle.
The translation engine sends email notifications via the AutoOps notification pipeline (tbi-ops-notify Lambda → SES). Notifications are consolidated across batch jobs and include detailed per-document breakdowns.
Each notification email includes:
Subject: [INFO] Translation Complete: 2 docs × 11 langs — 22/22 pairs SUCCEEDED
Batch Summary:
• Jobs: 2
• Documents: 2
• Languages: 11
• Total Pairs: 22
• Succeeded: 22
• Failed: 0
• Final State: SUCCEEDED
• Total Time: 7m 12s
Documents Translated:
• Trinity-Beast-AutoOps-Translation-Engine.html
✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it
• Trinity-Beast-Infrastructure-Overview.html
✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it
Deployment:
• CloudFront Invalidations: 2
• All translated files deployed to S3
Search Index:
• Rebuilt successfully (all 11 languages)
Subject: [WARNING] Translation Complete: 1 doc × 11 langs — 10/11 pairs PARTIAL
Documents Translated:
• Complex-Technical-Guide.html
✓ Succeeded: es, pt, fr, de, ru, hi, ja, zh, ar, it
✗ Failed: ur
Error Details:
• Complex-Technical-Guide.html → ur: chunk 14 failed validation after 3 retries
check_tag_counts: expected 27 code tags, found 23
When multiple translation jobs are submitted together (e.g., translating 5 documents), the notification system consolidates them into a single email:
This prevents notification spam when translating multiple documents — you get one comprehensive email covering the entire batch, not 5 separate emails.
When the same document appears in multiple jobs within a batch (e.g., initial run fails Urdu, retry succeeds), the notification resolves duplicate entries into a single final-state view:
Without the resolver, a retry job would show the same document twice — once with the failure and once with the fix — making the notification confusing and the counts misleading.
Every notification includes a Tag Inventory section showing source vs output tag counts per document. This lets you detect at a glance if the model is adding or dropping tags. As of v2.8, the inventory also reports Mermaid diagram counts:
Tag Inventory (source → output):
• Trinity-Beast-Translation-Service.html
IN: code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1
OUT: code:22 pre:5 strong:8 em:2 a:4 br:3 diagrams:1
If the model has a bad day and adds a <span> that wasn't in the source, or drops code tags, you'll see the mismatch immediately:
IN: code:23 pre:5 strong:8 diagrams:2
OUT: code:20 pre:4 strong:8 diagrams:1 ← 3 code dropped, 1 diagram lost
Tag counts are logged per-language in Aurora (translation_job_events) with tags_in and tags_out fields. The notification shows the first successful language's counts (source tags are identical across all languages since it's the same source document).
Recipient: All translation notifications go to CoryDeanKalani@CPMP-Site.org via the unified AutoOps notification pipeline. The sender is CPMP Mission <No-Reply@CPMP-Site.org>.
Documents change frequently — a new endpoint, a revised architecture, an updated pricing table. Without delta translation, every edit requires re-translating the entire document across all 11 languages. Delta translation solves this by identifying exactly which sections changed and translating only those, reusing cached translations for everything else.
The delta translation system leverages two key properties of the document library:
<!-- TBI-CHUNK --> markers: Human-placed section boundaries in the English source that divide documents into logical, independently-translatable sections.By comparing the current English document against the version that was last translated, the system identifies which sections changed (by content hash) and only sends those to Bedrock. Unchanged sections are pulled directly from the existing translated document. Typical savings: 70–90% on incremental updates.
The website bucket (trinity-beast-website-east2) has versioning enabled. Every aws s3 cp or s3api put-object creates a new version with a unique VersionId. The delta system uses this to:
VersionIdNo separate manifest storage is required — S3 already has the full history. A lightweight metadata file (docs/delta/{doc}.{lang}.json) tracks which VersionId was last translated for each document-language pair.
For delta translation to work, <!-- TBI-CHUNK --> markers must survive the translation round-trip. Previously, Bedrock silently dropped HTML comments during translation. The sentinel system now includes a Pass 0 that protects all HTML comments:
# Pass 0: Before Bedrock sees the chunk
<!-- TBI-CHUNK --> → __TBP0__ (sentinel token)
<!-- Section 5 --> → __TBP1__ (sentinel token)
# After translation: sentinels restored
__TBP0__ → <!-- TBI-CHUNK -->
__TBP1__ → <!-- Section 5 -->
This is implemented as the first pass in _apply_sentinels() in engine.py, before the existing translate="no" element extraction (Pass 1), paired span sentinels (Pass 2), and numeric protection (Pass 3). Comments are treated as Type A (FULL) sentinels — extracted completely and restored verbatim.
The algorithm is position-independent — sections are matched by content hash, not by index. This means markers can be added, removed, or repositioned between versions without breaking the delta logic.
Diagram 17.1: Delta Translation Flow
flowchart TD
A[Fetch Current English from S3] --> B[Split by TBI-CHUNK markers]
B --> C[Hash each section SHA-256]
D[Fetch Previous English version] --> E[Split by TBI-CHUNK markers]
E --> F[Hash each section]
C --> G{Compare hashes}
F --> G
G -->|Match found| H[Pull from existing translation]
G -->|No match| I[Send to Bedrock]
H --> J[Reassemble with TBI-CHUNK markers]
I --> J
J --> K[Deploy to S3 + Save metadata]
style A fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
style D fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
style H fill:#064e3b,stroke:#10b981,color:#e0e0e0
style I fill:#7c2d12,stroke:#f97316,color:#e0e0e0
style K fill:#1e3a5f,stroke:#60a5fa,color:#e0e0e0
Marker repositioning example:
Four KCC commands support delta translation and chunk management:
# List available S3 versions
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --list-versions
# Compare current vs previous version (auto-detects)
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html
# Compare against a specific version
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --version-id ksYxUBZIUB8Roi2KQYje6ig9R7JesL9z
# Show delta for a specific language
bash scripts/kcc.sh delta-diff Trinity-Beast-API-Reference.html --lang ja
# Dry run — show what would change without calling Bedrock
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es --dry-run
# Translate only changed sections for one language
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html es
# Translate changed sections for all languages
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all
# Force full translation (creates fresh baseline)
bash scripts/kcc.sh delta-translate Trinity-Beast-API-Reference.html all --force
options.delta)The delta option is also available on POST /admin/translate — the worker skips any language pair where the translated file on S3 is already newer than the source document. No local CLI needed.
# Submit a delta job via the remote API — skips up-to-date pairs automatically
curl -s -X POST -H "X-Admin-Key: $ADMIN_KEY" -H "Content-Type: application/json" \
-d '{"docs":["Trinity-Beast-API-Reference.html"],"langs":"all","options":{"delta":true}}' \
https://api.cpmp-site.org/admin/translate | jq .
# Validate TBI-CHUNK markers survived translation for all delta-enabled docs
bash scripts/kcc.sh delta-validate all all
# Validate a specific doc across all languages
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html all
# Validate a specific doc + language pair
bash scripts/kcc.sh delta-validate Trinity-Beast-API-Reference.html es
Reports pass/fail per doc×lang pair. Exit code 0 if all pass, 1 if any markers were lost. Run after any translation job to confirm Sentinel Pass 0 is working correctly.
# Analyze a doc from S3 and suggest TBI-CHUNK marker placement
bash scripts/kcc.sh chunk-size Trinity-Beast-API-Reference.html
# Analyze a local file
bash scripts/kcc.sh chunk-size /path/to/local/doc.html
Scans the document for <section>, <h2>, <h3>, and .category-section boundaries. Reports current chunk sizes (if markers exist), identifies policy violations, and suggests where to insert markers to stay within the 15KB/18KB/12KB policy. Dense sections (high translate="no" density) automatically target the tighter 12KB limit.
Existing translated documents do not contain <!-- TBI-CHUNK --> markers (they were stripped before the sentinel fix). The bootstrap sequence is:
--force to translate the entire document. The sentinel fix preserves markers in the output. Delta metadata is saved to S3.After the bootstrap run, typical savings on incremental updates:
| Change Type | Typical Savings | Example |
|---|---|---|
| Single section edit | 85–95% | Fix a typo, update one endpoint |
| New section added | 70–85% | Add a new feature section |
| Marker repositioned | 60–75% | Split a large section in two |
| Major rewrite | 20–40% | Restructure half the document |
Cost model: At approximately $1.50 per section-language pair, a 9-section document across 11 languages costs ~$148.50 for a full translation. With delta (2 sections changed), the same update costs ~$33 — a 78% reduction.
| Item | Value |
|---|---|
| Model | us.anthropic.claude-sonnet-4-6 (cross-region inference profile) |
| Failover Regions | us-east-2 → us-east-1 → us-west-2 |
| Target Languages | 11: es, pt, fr, de, ru, hi, ja, zh, ar, ur, it |
| Worker Runtime | Python 3.11 (ECS Fargate task, container image) |
| Deploy/Finalize Runtime | Go (provided.al2023) |
| Worker Resources | 1 vCPU / 3 GB (Fargate — no timeout ceiling) |
| Memory (Lambdas) | 1770 MB |
| Worker Timeout | None (runs to completion) |
| Finalize Timeout | 180s |
| Deploy Timeout | 60s |
| Max Docs per Request | 6 |
| Max Active Jobs | 3 |
| Daily Dollar Cap | $600 (24h TTL auto-reset) |
| Daily Token Cap | 50M combined tokens (24h TTL auto-reset) |
| Chunk Size (Latin scripts) | 6000 chars |
| Chunk Size (CJK + Russian) | 4500 chars (ja, zh, ru) |
| Chunk Size (Indic + Arabic) | 3000 chars (hi, ur, ar) |
| Retries per Chunk | 3 |
| Max Part Size | 24 KB (safety valve — prevents model output truncation) |
| MaxConcurrency (per-language) | 0 (unlimited — all language containers launch simultaneously) |
| ECR Repository | tbi-translate-worker |
| SQS Queue | trinity-beast-translation-queue |
| Step Function | tbi-translation-orchestrator |
| IAM Role (Worker + Lambdas) | tbi-translate-role |
| IAM Role (Pipe) | tbi-translate-pipe-role |
| IAM Role (Step Function) | tbi-translate-orchestrator-role |
| Valkey Keys | tx:job:{id}, tx:active, tx:history, tx:idempotency:{key}, autoops:bedrock:spend:daily, autoops:bedrock:tokens:input:daily, autoops:bedrock:tokens:output:daily |
| Aurora Tables | translation_jobs, translation_job_events |
| Delta Metadata | docs/delta/{doc}.{lang}.json (S3) |
| Delta CLI | bash scripts/kcc.sh delta-diff, bash scripts/kcc.sh delta-translate, bash scripts/kcc.sh delta-validate, bash scripts/kcc.sh chunk-size |
| CloudWatch Namespace | TBI/Translation |