The true cost of failed calls: they still bill
The most persistent misconception in voice AI cost management is that failed calls do not cost money because they did not deliver value. The reality is the opposite: failed calls cost nearly the same as successful calls, and they cost you twice — once in direct provider spend and once in the downstream cost of a bad caller experience.
Twilio's billing model charges for billable minutes from the moment a call is answered until disconnection. A call that connects, triggers Twilio's silence detection at second 6, and drops is billed at the 6-second mark — which rounds up to the minimum billing increment (6 seconds on most plans, 30 seconds on some legacy plans). If your outbound call cadence dials 1,000 numbers per day and 5% experience silence-detection drops, you are paying for 50 fully-billed call failures every day. At $0.014/min outbound, a 6-second drop costs $0.0014 — small individually, but $70/month in pure waste at that volume.
ElevenLabs' billing model charges for characters at the time the synthesis request is submitted, not at the time the audio plays. If your AI agent submits a 300-character synthesis request and the call drops at second 4 before the audio begins streaming, ElevenLabs has already charged for 300 characters. You paid for synthesis that the caller never heard.
At scale, this compounds quickly. A deployment running 5,000 calls per day at a 5% silent failure rate is generating 250 failed calls per day. If each failed call consumes $0.05 in Twilio + ElevenLabs spend, that is $12.50/day or $375/month in direct waste. More importantly, the 5% failure rate is often preventable — silence timeout misconfiguration, ElevenLabs latency spikes without automated detection, and LLM-generated responses that are too long for the configured silence window all produce fixable silent failures.
ElevenLabs plan selection: running the math correctly
ElevenLabs pricing in 2026 is tiered by monthly character allocation. The plans most relevant for production voice AI deployments:
Creator ($22/month): 100,000 characters included. Overage rate: approximately $0.22/1,000 characters. Suitable for low-volume testing (under 5,000 calls/month at 200 characters per call average).
Scale ($99/month): 500,000 characters included. Suitable for deployments generating 100–500 calls/day depending on verbosity.
Business ($330/month): 2 million characters included. For deployments generating 500–2,000+ calls/day.
The calculation that teams get wrong is ignoring the overage rate. On Creator, the included 100,000 characters cost $0.22/1,000. If you consume 250,000 characters in a month, the bill is: $22 (plan) + 150,000 × $0.00022 (overage) = $22 + $33 = $55. On Scale, 250,000 characters costs: $99 (plan, which includes 500,000) = $99 — more expensive than Creator for this volume. The crossover where Scale becomes cheaper is at approximately 350,000 characters per month.
To calculate your actual character consumption: pull the ElevenLabs history from the dashboard or API and sum the character_count field across all generations in the billing period. Alternatively, divide your monthly ElevenLabs bill by the per-character rate for your plan. Run this calculation quarterly — character consumption grows with call volume and with changes to AI agent verbosity, and many teams remain on a suboptimal plan tier for months because the review does not happen automatically.
One frequently missed optimization: ElevenLabs charges per character of text input, not per second of audio output. A verbose AI response that generates 10 seconds of audio at a fast speaking rate costs the same as a concise response that generates 10 seconds of audio at a slow speaking rate — because the character count, not the audio duration, determines the charge. Increasing TTS speaking rate via the stability and speaking_rate parameters reduces audio duration (improving call pacing) without reducing character charges.
Twilio cost optimization: plan negotiation and per-minute reduction
Twilio's list pricing for voice calls in 2026 is $0.0085/min for inbound calls and $0.014/min for outbound calls to US phone numbers. These are starting prices — Twilio has volume tiers that begin to apply at relatively modest usage levels, and teams generating $500+/month in Twilio spend are typically eligible for custom pricing via a Twilio account manager conversation.
The steps to reduce Twilio per-minute costs:
Audit your current pricing tier. In the Twilio Console, navigate to Billing > Pricing to see your current per-minute rates. If you are on list pricing and generating more than 50,000 minutes per month, you are almost certainly leaving money on the table. Request a pricing review from your Twilio account manager.
Route calls through a SIP trunk for high-volume outbound. Twilio Programmable Voice is priced per minute with carrier overhead included. If you are running high-volume outbound campaigns (10,000+ calls/day), routing through a SIP trunk from a carrier with lower per-minute rates — Bandwidth, Telnyx, Vonage — can reduce the telephony layer cost by 40–60%. Twilio remains as the orchestration layer; the carrier handles the PSTN termination at lower cost. This requires SIP trunking configuration work but the ROI is substantial at volume.
Reduce billable duration on failed and short calls. Twilio's billing increment is configurable. Standard rounding is to 6-second increments; some plans and carriers use 30-second or 60-second increments. Verify your plan's billing increment. If calls frequently drop in the first 6–15 seconds (silence-detection failures), fixing the root cause reduces both the call failure rate and the wasted billable seconds.
Audit Twilio add-on charges. Twilio charges separately for features like call recording, transcription, voice insights, and BYOC (bring your own carrier). Pull your Twilio invoice line items and audit each add-on charge against actual usage. Disabled Voice Insights on call flows that do not use the data can reduce costs meaningfully for high-volume deployments.
Silence detection as a cost lever: the often-ignored connection
Silence detection configuration — specifically Twilio's silence_timeout parameter — has a direct, quantifiable effect on your voice AI cost structure. The connection is not obvious until you model it.
When silence_timeout is too short relative to your TTS latency, calls drop during the AI's response generation window. Each drop consumes Twilio billable minutes (the call was answered and is actively connected) and ElevenLabs characters (the synthesis request was submitted). The combined per-drop cost is approximately $0.04–$0.08 for a typical call. At 100 drops per day from misconfigured silence detection, that is $4–$8/day or $120–$240/month in preventable waste.
But the bigger cost is the lost conversion. A silence-detection-induced drop in the middle of a qualified conversation — where the AI was generating a value-add response and the silence timeout fired before the audio arrived — is a lost conversion, not just a failed call. At any meaningful conversion value, the lost conversion cost dwarfs the direct provider cost.
The optimization: run the silence timeout tuning process described in the ElevenLabs latency guide (measure p95 TTS latency for your configuration, add 2 seconds, set that as your silence_timeout). For most eleven_flash_v2_5 deployments, the correct value is 5–7 seconds. For turbo, 6–8 seconds. Measure the effect on your short-duration completed call rate (calls under 10 seconds) — this metric should drop measurably within 24 hours of the change. If it does not, you have a different failure mode that silence timeout tuning does not address.
A secondary silence-related cost optimization: use Twilio's
verb or equivalent mechanisms to inject 500ms of audio (or low-level ambient noise) during TTS generation, preventing the silence window from counting down during the AI's processing time. This effectively decouples silence detection from TTS latency, allowing much tighter timeout values for human speaking silence without risking drops during AI response generation.
LLM verbosity as an ElevenLabs cost driver
ElevenLabs charges per character of text input. Every word the AI says in a call costs characters. AI agent verbosity is therefore a direct cost variable, and it is controlled by your LLM prompt.
Typical character consumption patterns for conversational AI agents:
- A concise, directive AI response (booking confirmation, next step instruction): 80–150 characters
- A standard conversational AI turn: 150–300 characters
- A verbose AI explanation or handling a complex objection: 400–800 characters
At 300 calls per day with an average of 4 AI turns per call and an average turn length of 250 characters:
Daily character consumption: 300 × 4 × 250 = 300,000 characters
Monthly: approximately 9 million characters
A 30% reduction in average turn length (from 250 to 175 characters through prompt engineering) produces:
Monthly consumption: approximately 6.3 million characters — a saving of 2.7 million characters per month.
At ElevenLabs Scale pricing of approximately $0.165/1,000 characters on the Business plan, that is a $446/month saving from prompt work alone — without changing call volume, model selection, or infrastructure.
The prompt engineering principles that reduce verbosity without reducing call quality:
1. Instruct the LLM to respond in conversational sentence fragments, not complete formal prose.
2. Prohibit filler phrases ('Great question!', 'Of course, I'd be happy to...') — these add 15–30 characters per turn with zero informational value.
3. Instruct the LLM to confirm with a single word before elaborating: 'Confirmed. [elaboration]' instead of 'Yes, I can confirm that...'.
4. Set a maximum response length in characters (or approximate word count) in the system prompt.
5. Review actual call transcripts monthly to identify verbosity patterns — LLMs often develop characteristic verbose patterns that a single prompt instruction can eliminate.
Tracking cost-per-successful-call: the metric that matters
Most voice AI teams track cost-per-call. This is the wrong metric. Cost-per-call measures efficiency of provider spend without capturing whether the spend produced results. Cost-per-successful-call (CPSC) captures the actual economics of your voice AI deployment.
The formula: CPSC = total provider spend / number of successful call outcomes
Where 'successful outcome' is defined by your deployment's purpose — a booked appointment, a qualified lead, a completed survey response, a resolved support ticket. The outcome definition must be tracked in your CRM or analytics system and must be attributable to a specific call.
The weekly CPSC calculation process:
1. Pull total Twilio spend for the week from the Twilio billing API (or invoice).
2. Pull total ElevenLabs spend for the week from the ElevenLabs API.
3. Sum for total voice AI infrastructure spend.
4. Query your CRM for calls marked with the success outcome in the same week.
5. Divide.
Sample calculation:
Week total: $320 Twilio + $180 ElevenLabs = $500
Successful outcomes: 400 bookings
CPSC = $500 / 400 = $1.25 per booked appointment
This metric immediately reveals configuration regressions. If CPSC rises from $1.25 to $1.80 week-over-week with no change in call volume, either spend increased (provider costs, longer calls) or success rate dropped (more calls, fewer outcomes). Both are actionable. Neither is visible in cost-per-call or success-rate-alone metrics.
Trend CPSC alongside its components: call success rate, average call cost, and call volume. A rising CPSC with stable success rate and stable volume indicates a cost increase (check for ElevenLabs plan overage, Twilio rate changes). A rising CPSC with falling success rate indicates a quality degradation (check for recent AI prompt changes, new call routing configurations, or increased silent failure rate).
How Sherlock surfaces cost anomalies before the invoice arrives
The standard workflow for voice AI cost management is reactive: the monthly invoice arrives, someone notices it is higher than expected, and the investigation begins. By that point, a costly misconfiguration has been running for up to 30 days.
The anomaly that costs most in this reactive model is runaway ElevenLabs character consumption from a newly deployed AI agent with a verbose prompt. A single agent deployed with an unoptimized prompt that generates 600-character responses instead of 200-character responses will triple ElevenLabs costs for every call it handles — and because ElevenLabs' in-dashboard alerts require manual threshold configuration, there is no automated signal until the bill arrives.
Sherlock monitors call economics on a rolling basis. For each connected provider, Sherlock tracks cost per call, character consumption per call, and calls-to-outcome ratio. When any metric deviates from the 7-day rolling average by more than a configurable threshold — typically 20% increase in cost per call or 15% drop in success rate — Sherlock posts a Slack alert with the specific change, the time window it began, and the affected call segment (specific phone number, specific agent, specific time of day).
The median time from a cost anomaly beginning to a Sherlock alert is under 4 hours. The median time from a cost anomaly beginning to a human noticing it without tooling is 11–18 days (typically at invoice review). The 11-day delta is the window in which a $500/month misconfiguration becomes a $5,500 surprise.
For cost-per-successful-call tracking, Sherlock also ingests CRM outcome data and computes CPSC on a daily basis — no manual spreadsheet calculation required. See [usesherlock.ai](https://usesherlock.ai/?utm_source=blog&utm_medium=content&utm_campaign=cost-optimization-guide) to connect your providers.