Complete reference for all 39 runtime parameters governing the Translation Engine — Bedrock pricing, token estimation, cost controls, customer pricing, and batch inference configuration.
Translation parameters control every cost, pricing, and operational aspect of the Translation Engine (TBTS). They are stored in the translation_parameters Aurora table — the authoritative source — and cached in Valkey as the tx:params hash for fast reads by ECS containers and Lambda workers.
translation_parameters, not application_parameters. This prevents a bad translation config change from affecting the LPO server, and vice versa.tx:params from Valkey on startup. Lambda workers load it on cold start.POST /admin/translate/params/reload flushes Valkey and reloads from Aurora immediately. All containers pick up changes on their next 5-minute poll.Important: The legacy Valkey keys translation:rate:{model}:input and translation:rate:{model}:output are superseded by tx:params. The app:config keys translation_markup_pct and translation_infra_per_pair are also superseded. All translation cost logic now reads exclusively from tx:params. The legacy keys are retained for backward compatibility only.
-- Aurora: authoritative source
SELECT key, value, description, category, updated_at, updated_by
FROM translation_parameters
ORDER BY category, key;
-- Valkey: fast cache (tx:params hash, 5-min TTL)
HGETALL tx:params
-- Update a parameter (applies immediately to Aurora + Valkey + in-memory)
POST /admin/translate/params
{"key": "pricing.markup_pct", "value": "35", "updated_by": "cory"}
-- Force all containers to reload from Aurora
POST /admin/translate/params/reload
-- View parameters in the TBI Administration dashboard
Translation tab → Translation Parameters section
Every parameter accessor follows a three-level fallback to ensure the system never returns zero for a cost value:
tx:params hash — fast path, ~0.1ms, refreshed every 5 minutestranslation_parameters table — authoritative source, ~5ms, used when Valkey is unavailableThe Translation Engine tracks every token at the most granular level Bedrock reports. The ground-truth cost calculation for any translation job is:
BedrockCostForTokens — the single source of truth for all cost calculations:
cost = (uncached_input_tokens × input_rate_per_token)
+ (cached_tokens × cache_read_rate_per_token) ← 90% cheaper than input
+ (cache_write_tokens × cache_write_rate_per_token) ← one-time per cache population
+ (output_tokens × output_rate_per_token)
Where uncached_input = total_input - cached_tokens. This formula is used everywhere actual cost is recorded — finalize Lambda, SQS cost messages, Aurora bedrock_cost_usd column, and the daily spend counter in Valkey.
What we charge customers is derived from the Bedrock cost plus infrastructure overhead and a service fee markup:
subtotal = bedrock_cost + infra_per_pair_usd with_markup = subtotal × (1 + markup_pct / 100) -- Standard tier (batch, 12-24h SLA): customer_price = with_markup × (1 - standard_tier_discount_pct / 100) -- Express tier (real-time, 1-6h SLA): customer_price = with_markup × (1 + express_tier_premium_pct / 100) -- Floor: customer_price = max(customer_price, minimum_quote_usd)
The three optimization layers compound multiplicatively for internal document translation:
| Optimization | Mechanism | Savings | Parameter |
|---|---|---|---|
| Delta processing | Skip pairs where translation is newer than source | 70–90% reduction in work | Controlled by options.delta at job submission |
| Batch inference | Bedrock batch API vs on-demand | 50% off per token | bedrock.*.batch.*_per_1m |
| Prompt caching | System prompt + protected terms cached | 90% off cached tokens | bedrock.*.cache_read_per_1m |
The actual per-token rates Amazon Bedrock charges us. These are our costs — not what we charge customers. Separate keys exist for each model, execution mode (realtime vs batch), and caching operation.
Batch inference is 50% of on-demand. Prompt cache reads are 90% off on-demand input. Cache writes are 1.25× on-demand (paid once per cache population).
| Key | Default | Unit | Description |
|---|---|---|---|
bedrock.sonnet46.realtime.input_per_1m | 3.00 | USD / 1M tokens | Sonnet 4.6 on-demand input. The "Express tier" rate — real-time jobs, 1–6h SLA. |
bedrock.sonnet46.realtime.output_per_1m | 15.00 | USD / 1M tokens | Sonnet 4.6 on-demand output. |
bedrock.sonnet46.batch.input_per_1m | 1.50 | USD / 1M tokens | Sonnet 4.6 batch inference input. 50% off on-demand. Used for Standard tier (12–24h SLA). |
bedrock.sonnet46.batch.output_per_1m | 7.50 | USD / 1M tokens | Sonnet 4.6 batch inference output. 50% off on-demand. |
bedrock.sonnet46.cache_write_per_1m | 3.75 | USD / 1M tokens | Sonnet 4.6 prompt cache write. 1.25× on-demand — paid once when the cache is first populated. |
bedrock.sonnet46.cache_read_per_1m | 0.30 | USD / 1M tokens | Sonnet 4.6 prompt cache read. 90% off on-demand input. Applied to system prompt + protected terms tokens on every cached request. |
bedrock.haiku35.realtime.input_per_1m | 0.80 | USD / 1M tokens | Haiku 3.5 on-demand input. Used for Latin-script languages in auto-routing mode. |
bedrock.haiku35.realtime.output_per_1m | 4.00 | USD / 1M tokens | Haiku 3.5 on-demand output. |
bedrock.haiku35.batch.input_per_1m | 0.40 | USD / 1M tokens | Haiku 3.5 batch inference input. |
bedrock.haiku35.batch.output_per_1m | 2.00 | USD / 1M tokens | Haiku 3.5 batch inference output. |
bedrock.opus4.realtime.input_per_1m | 15.00 | USD / 1M tokens | Opus 4 on-demand input. Highest quality — critical documents, complex grammar. |
bedrock.opus4.realtime.output_per_1m | 75.00 | USD / 1M tokens | Opus 4 on-demand output. |
bedrock.opus4.batch.input_per_1m | 7.50 | USD / 1M tokens | Opus 4 batch inference input. |
bedrock.opus4.batch.output_per_1m | 37.50 | USD / 1M tokens | Opus 4 batch inference output. |
Ratios used to estimate cost before a job runs — for customer quotes and pre-flight spend checks. These are derived from actual Aurora translation_job_events data and should be updated as real batch job data accumulates.
| Key | Default | Unit | Description |
|---|---|---|---|
estimation.tokens_per_byte | 0.64 | tokens / byte | Input tokens per translatable byte. Derived from job history — remarkably consistent at 0.63–0.66 across all document types. Update after each batch run from the translation:actuals:{model}:{mode} Valkey keys written by the nightly sync job. |
estimation.output_input_ratio | 0.76 | ratio | Output tokens divided by input tokens. Varies by document type: code-heavy docs ~0.46 (code passes through untranslated), prose-heavy docs ~1.01. The 0.76 average is a reasonable middle ground for mixed technical documentation. |
estimation.quote_padding_pct | 6 | percent | Padding applied to token estimates for customer quotes. A 6% buffer keeps quotes slightly conservative — actual cost is usually at or below the estimate. Absorbed by the markup buffer on prose-heavy documents. |
estimation.cache_hit_rate | 0.30 | fraction (0–1) | Expected fraction of input tokens served from the prompt cache. The system prompt + protected terms list accounts for approximately 30% of total input tokens. Update from actual cache_hit_rate values in translation:actuals:* Valkey keys after batch jobs run. |
estimation.cached_system_tokens | 1200 | tokens | Approximate token count of the cacheable system prompt block (system instructions + protected terms list). This is the fixed overhead per request — identical for every doc/lang pair. Must be a multiple of 3 per Trinity Beast convention. |
How to update estimation parameters: After each batch run, the nightly sync job writes actual token ratios to Valkey keys like translation:actuals:claude-sonnet-4.6:batch. Review these values in the TBI Administration dashboard (Translation tab → Token Usage) and update the estimation parameters if the actuals drift significantly from the defaults. A 5% drift is normal; a 15%+ drift warrants an update.
Operational limits that prevent runaway Bedrock costs. The daily spend cap is a hard stop — no new jobs are accepted after it is reached. The token limit is a secondary guard that catches unusually large documents before they hit the dollar cap.
| Key | Default | Unit | Description |
|---|---|---|---|
controls.daily_spend_limit_usd | 600 | USD | Daily Bedrock spend cap. No new translation jobs are accepted after this limit is reached. Resets at midnight UTC. Must be a multiple of 3 (200×3). Tracked in Valkey key autoops:bedrock:spend:daily. |
controls.daily_token_limit | 51,000,000 | tokens | Daily combined token limit (input + output). Secondary guard — model-agnostic. Catches unusually large documents before they hit the dollar cap. 51M = 17M×3. Tracked in Valkey keys autoops:bedrock:tokens:input:daily and autoops:bedrock:tokens:output:daily. |
controls.max_docs_per_request | 6 | documents | Maximum documents per single POST /admin/translate request. Multiple of 3 per Trinity Beast convention. Customer quote submissions (TBTS) are unconstrained at the request layer — this limit applies to internal admin submissions only. |
controls.max_active_jobs | 3 | jobs | Maximum concurrent active translation jobs. Multiple of 3. A warning is logged when this threshold is reached, but jobs are still accepted (the limit is advisory, not a hard stop). |
controls.max_queue_depth | 12 | jobs | Maximum jobs in the translation queue. Multiple of 3. New submissions are rejected with a 429 when this depth is reached. |
controls.batch_poll_interval_seconds | 300 | seconds | How often the Step Function polls Bedrock for batch job status. 300 = 5 minutes. Multiple of 3 (100×3). Shorter intervals increase API call costs; longer intervals delay job completion notification. |
controls.batch_timeout_hours | 6 | hours | Hours before a Bedrock batch job is considered timed out and marked as failed. Multiple of 3 (2×3). AWS typically completes batch jobs within 1–3 hours; 6 hours provides a safe buffer. |
controls.cost_per_chunk_floor | 0.03 | USD | Minimum cost-per-chunk floor used by the nightly sync job when calculating rolling averages. Prevents unrealistically low averages from skewing customer quotes during low-activity periods. |
controls.cost_per_chunk_ceiling | 0.09 | USD | Maximum cost-per-chunk ceiling used by the nightly sync job. Caps runaway averages caused by anomalous jobs (e.g., a single very large document). |
These parameters control what customers pay for TBTS translations. They are applied at quote time by CustomerPriceForPair() in the translation engine. Changing these takes effect immediately — no deploy needed.
| Key | Default | Unit | Description |
|---|---|---|---|
pricing.infra_per_pair_usd | 0.003 | USD / pair | Infrastructure cost per language pair. Covers ECS task time, S3 storage, CloudFront invalidation, Lambda invocations, and Step Function state transitions. Added to Bedrock cost before markup is applied. |
pricing.markup_pct | 30 | percent | Service fee markup applied to (Bedrock cost + infra cost). Covers support, margin, and operational overhead. Applied at quote time and in all customer-facing emails. This is the primary lever for adjusting customer pricing — change it here and it takes effect everywhere immediately. |
pricing.infra_markup_pct | 9 | percent | Infrastructure markup applied by the nightly sync job to the rolling 7-day cost-per-chunk averages. Covers ECS, S3, CloudFront, SQS, and Step Functions. Baked into the translation:cost_per_chunk:{model} Valkey keys. Separate from pricing.markup_pct — this is the cost side, not the revenue side. |
pricing.standard_tier_discount_pct | 30 | percent | Discount for Standard tier quotes (batch inference, 12–24h SLA). We use batch inference for Standard tier so our Bedrock cost is also 50% lower — this discount passes some of that savings to the customer while maintaining margin. |
pricing.express_tier_premium_pct | 0 | percent | Premium for Express tier quotes (real-time inference, 1–6h SLA). Currently 0 — Express is priced at the same markup as Standard before the Standard discount. Set to a positive value to charge a premium for faster delivery. |
pricing.minimum_quote_usd | 0.99 | USD | Minimum customer quote price. Prevents sub-dollar quotes that cost more to process (Stripe fees, email, support) than they earn. Applied as a floor after all markup calculations. |
Full customer price formula:
subtotal = bedrock_cost + pricing.infra_per_pair_usd with_markup = subtotal × (1 + pricing.markup_pct / 100) Standard tier: price = with_markup × (1 - pricing.standard_tier_discount_pct / 100) Express tier: price = with_markup × (1 + pricing.express_tier_premium_pct / 100) Final: price = max(price, pricing.minimum_quote_usd)
Operational settings for the Bedrock batch inference pipeline. These control where batch job files are stored, which IAM role Bedrock assumes, and how the Step Function manages batch jobs.
| Key | Default | Unit | Description |
|---|---|---|---|
batch.s3_bucket | trinity-beast-translation-jobs | bucket name | S3 bucket for batch job JSONL input and output files. Structure: {job_id}/input.jsonl and {job_id}/output.jsonl. Located in us-east-2 (same region as Bedrock endpoint). |
batch.iam_role_arn | arn:aws:iam::211998422884:role/tbi-bedrock-batch-role | ARN | IAM role that Bedrock assumes to read input JSONL from S3 and write output JSONL back. Permissions: s3:GetObject on input, s3:PutObject on output, bedrock:InvokeModel on Sonnet 4.6. |
batch.default_model | claude-sonnet-4.6 | model name | Default model for batch translation jobs when no model is specified in the job submission. Sonnet 4.6 is the recommended default — best balance of quality and cost for technical documentation. |
batch.max_output_tokens | 200,000 | tokens | Maximum output tokens per Bedrock request in the batch JSONL. 200,000 = 200K tokens. Sufficient for the largest documents in the library. Must be a multiple of 3 per Trinity Beast convention. |
batch.job_state_ttl_seconds | 518,400 | seconds | Valkey TTL for batch job state keys (tx:job:{id}). 518,400 = 6 days (172,800×3). Batch jobs can take up to 6 hours to complete; the 6-day TTL ensures state is available for monitoring and debugging after completion. |
The easiest way to manage translation parameters is through the TBI Administration dashboard. Navigate to the Translation tab and scroll to the Translation Parameters section (admin only).
tx:params Valkey cache and reloads from Aurora. All ECS containers pick up changes on their next 5-minute poll cycle.-- List all parameters
GET /admin/translate/params
-- Filter by category
GET /admin/translate/params?category=customer_pricing
-- Update a parameter (applies immediately to Aurora + Valkey + in-memory)
POST /admin/translate/params
{"key": "pricing.markup_pct", "value": "35", "updated_by": "cory"}
-- Force all containers to reload from Aurora
POST /admin/translate/params/reload
-- View all parameters SELECT key, value, category, updated_at, updated_by FROM translation_parameters ORDER BY category, key; -- Update a parameter UPDATE translation_parameters SET value = '35', updated_at = NOW(), updated_by = 'cory' WHERE key = 'pricing.markup_pct'; -- After a direct SQL update, reload Valkey via the API: POST /admin/translate/params/reload
Important: Direct SQL updates to Aurora do NOT automatically update Valkey or in-memory values on running containers. Always follow a direct SQL update with a POST /admin/translate/params/reload call to ensure all containers pick up the change within their next 5-minute poll cycle.
The nightly sync job (trinity-beast-sync-job, runs at 1 AM EDT) performs three translation-related tasks:
Reads all rows from the translation_parameters Aurora table and writes them to the tx:params Valkey hash with a 5-minute TTL. This ensures all containers have fresh parameter values even if the Valkey cache expired overnight.
Queries translation_job_events for the last 7 days, grouped by model and execution mode. Calculates the average cost-per-chunk for each combination and writes to Valkey:
| Valkey Key | Description |
|---|---|
translation:cost_per_chunk:{model} | Real-time path (backward compatible) |
translation:cost_per_chunk:{model}:batch | Batch inference path |
translation:cost_per_chunk:{model}:batch_cached | Batch + prompt caching path |
translation:cost_per_chunk | Blended average (all models, backward compat) |
The 9% infrastructure markup (pricing.infra_markup_pct) is applied to these averages before writing to Valkey. The floor and ceiling from controls.cost_per_chunk_floor and controls.cost_per_chunk_ceiling are also applied.
Writes actual token ratios from real jobs to Valkey for operator review:
-- Key pattern: translation:actuals:{model}:{execution_mode}
-- Example: translation:actuals:claude-sonnet-4.6:batch
Fields:
avg_input_tokens — average input tokens per pair
avg_output_tokens — average output tokens per pair
avg_cached_tokens — average cached tokens per pair
output_input_ratio — derived: avg_output / avg_input
cache_hit_rate — derived: avg_cached / avg_input
pair_count — number of pairs in the 7-day window
updated_at — when this was last calculated
Review these values after each batch run and update estimation.output_input_ratio and estimation.cache_hit_rate in translation_parameters if the actuals drift significantly from the defaults.
All 39 parameters in the translation_parameters table, sorted by category and key.
| Key | Category | Default | Description |
|---|---|---|---|
batch.default_model | batch_config | claude-sonnet-4.6 | Default model for batch jobs |
batch.iam_role_arn | batch_config | arn:aws:iam::211998422884:role/tbi-bedrock-batch-role | IAM role for Bedrock batch jobs |
batch.job_state_ttl_seconds | batch_config | 518400 | Valkey TTL for batch job state (6 days) |
batch.max_output_tokens | batch_config | 200000 | Max output tokens per Bedrock request |
batch.s3_bucket | batch_config | trinity-beast-translation-jobs | S3 bucket for batch JSONL files |
bedrock.haiku35.batch.input_per_1m | bedrock_pricing | 0.40 | Haiku 3.5 batch input (USD/1M tokens) |
bedrock.haiku35.batch.output_per_1m | bedrock_pricing | 2.00 | Haiku 3.5 batch output (USD/1M tokens) |
bedrock.haiku35.realtime.input_per_1m | bedrock_pricing | 0.80 | Haiku 3.5 on-demand input (USD/1M tokens) |
bedrock.haiku35.realtime.output_per_1m | bedrock_pricing | 4.00 | Haiku 3.5 on-demand output (USD/1M tokens) |
bedrock.opus4.batch.input_per_1m | bedrock_pricing | 7.50 | Opus 4 batch input (USD/1M tokens) |
bedrock.opus4.batch.output_per_1m | bedrock_pricing | 37.50 | Opus 4 batch output (USD/1M tokens) |
bedrock.opus4.realtime.input_per_1m | bedrock_pricing | 15.00 | Opus 4 on-demand input (USD/1M tokens) |
bedrock.opus4.realtime.output_per_1m | bedrock_pricing | 75.00 | Opus 4 on-demand output (USD/1M tokens) |
bedrock.sonnet46.batch.input_per_1m | bedrock_pricing | 1.50 | Sonnet 4.6 batch input (USD/1M tokens) |
bedrock.sonnet46.batch.output_per_1m | bedrock_pricing | 7.50 | Sonnet 4.6 batch output (USD/1M tokens) |
bedrock.sonnet46.cache_read_per_1m | bedrock_pricing | 0.30 | Sonnet 4.6 cache read (USD/1M tokens, 90% off) |
bedrock.sonnet46.cache_write_per_1m | bedrock_pricing | 3.75 | Sonnet 4.6 cache write (USD/1M tokens, 1.25×) |
bedrock.sonnet46.realtime.input_per_1m | bedrock_pricing | 3.00 | Sonnet 4.6 on-demand input (USD/1M tokens) |
bedrock.sonnet46.realtime.output_per_1m | bedrock_pricing | 15.00 | Sonnet 4.6 on-demand output (USD/1M tokens) |
controls.batch_poll_interval_seconds | cost_controls | 300 | Batch job status poll interval (seconds) |
controls.batch_timeout_hours | cost_controls | 6 | Hours before batch job is timed out |
controls.cost_per_chunk_ceiling | cost_controls | 0.09 | Max cost-per-chunk for rolling averages |
controls.cost_per_chunk_floor | cost_controls | 0.03 | Min cost-per-chunk for rolling averages |
controls.daily_spend_limit_usd | cost_controls | 600 | Daily Bedrock spend cap (USD) |
controls.daily_token_limit | cost_controls | 51000000 | Daily combined token limit |
controls.max_active_jobs | cost_controls | 3 | Max concurrent active jobs |
controls.max_docs_per_request | cost_controls | 6 | Max docs per translation request |
controls.max_queue_depth | cost_controls | 12 | Max jobs in translation queue |
estimation.cache_hit_rate | token_estimation | 0.30 | Expected prompt cache hit fraction |
estimation.cached_system_tokens | token_estimation | 1200 | Tokens in cacheable system prompt block |
estimation.output_input_ratio | token_estimation | 0.76 | Output / input token ratio |
estimation.quote_padding_pct | token_estimation | 6 | Padding % applied to token estimates |
estimation.tokens_per_byte | token_estimation | 0.64 | Input tokens per translatable byte |
pricing.express_tier_premium_pct | customer_pricing | 0 | Express tier premium % (real-time, 1–6h) |
pricing.infra_markup_pct | customer_pricing | 9 | Infra markup % applied by sync job |
pricing.infra_per_pair_usd | customer_pricing | 0.003 | Infrastructure cost per language pair (USD) |
pricing.markup_pct | customer_pricing | 30 | Service fee markup % for customer quotes |
pricing.minimum_quote_usd | customer_pricing | 0.99 | Minimum customer quote price (USD) |
pricing.standard_tier_discount_pct | customer_pricing | 30 | Standard tier discount % (batch, 12–24h) |