The base cost layers: what you are actually paying per call
Every AI voice call passes through at least four billing meters simultaneously, and most teams only watch one of them.
Layer 1: Telephony (Twilio). Twilio charges $0.0085/min for inbound calls to local numbers, $0.014/min for outbound calls to U.S. numbers, and $0.022/min for toll-free inbound. Call recording adds $0.0025/min. Transcription — if you use Twilio's built-in service rather than a third-party STT — adds $0.05–$0.10/min. These rates are before carrier surcharges, regulatory fees, and taxes, which add 5–15% depending on your state and call routing.
Layer 2: Speech-to-Text (STT). The caller's speech must be transcribed before the LLM can process it. Deepgram charges approximately $0.006/min, ElevenLabs STT is $0.007/min, and Azure is $0.017/min. STT runs for the entire duration of the call, not just the portions where the caller is speaking — silence is still metered.
Layer 3: LLM inference. Your language model processes the transcript and generates a response. Costs vary dramatically by model: GPT-4.1 runs $0.06 per million input tokens, Claude 3 Opus at $0.09/M tokens. A 2-minute call with 4 conversational turns might consume 2,000–4,000 tokens total, costing $0.01–$0.04 in LLM inference. But context window accumulation means later turns in longer calls carry the full conversation history, making the per-turn cost increase as the call progresses.
Layer 4: Text-to-Speech (TTS). The LLM's text response is synthesized into audio. ElevenLabs conversational AI pricing starts at $0.10/min on Creator plans and drops to $0.08/min on Business plans, with overage rates of $0.06–$0.15/min. Vapi's built-in TTS is $0.022/min; Deepgram TTS is $0.011/min. The TTS cost is directly proportional to the length of the AI's response — a verbose agent that generates 300-character responses instead of 80-character responses costs 3.75x more in TTS per turn.
Layer 5: Orchestration platform. If you use Vapi, there is a flat $0.05/min platform fee on top of everything else. This fee applies to the full call duration regardless of how much of that time involved active AI processing.
Stack these layers for a concrete example. A 2-minute outbound call through Vapi with Twilio transport, Deepgram STT, GPT-4.1, and ElevenLabs TTS:
- Twilio: 2 x $0.014 = $0.028
- Deepgram STT: 2 x $0.006 = $0.012
- GPT-4.1: ~$0.02 (4 turns, accumulating context)
- ElevenLabs TTS: 2 x $0.10 = $0.20
- Vapi platform: 2 x $0.05 = $0.10
- Total: $0.36 per call
At 5,000 calls per month, that is $1,800/month — and we have not yet accounted for failed calls, retries, or overages.
Hidden costs most teams miss: the silent multipliers
The per-call math above assumes every call is clean — it connects, runs, and ends without incident. In production, 8–15% of calls are not clean. Here is where the real cost inflation happens.
Failed calls that still bill. Twilio bills any call that reaches the 'in-progress' state, even if it drops 2 seconds later due to a webhook timeout or TTS latency spike. The minimum billing increment on many routes is 1 minute, meaning a 3-second connected call that fails costs you the same as a 58-second successful call. If your failure rate is 10% on 5,000 monthly calls, that is 500 calls billed at minimum increment with zero business value — roughly $70–$100/month in pure waste on Twilio alone.
TTS retries on latency spikes. When ElevenLabs generation latency exceeds the orchestration layer's timeout threshold (typically 3–5 seconds), many implementations retry the TTS request automatically. The retry consumes the same characters again. ElevenLabs bills for both the original and the retry — there is no credit for timed-out generations that were never delivered. If 5% of your TTS generations trigger a retry, your effective TTS cost is 5% higher than your call volume would suggest. On an ElevenLabs Pro plan with 500,000 characters/month, that is 25,000 wasted characters — enough to push you into overage territory a week early.
LLM verbosity tax. This is the most insidious hidden cost because it compounds across two layers simultaneously. When your LLM generates a 280-character response instead of a 90-character response, two things happen: the TTS cost triples (more characters to synthesize), and the call duration increases by 3–5 seconds (more audio to play), which means the per-minute telephony and platform fees also increase. A model that averages 2.5x expected verbosity inflates your total per-call cost by 30–40%, not 2.5x, because the verbosity cascades through dependent billing layers.
Webhook retry storms. Twilio retries failed status callback webhooks up to 3 times with exponential backoff. Each retry is a separate HTTP request to your server. If your webhook endpoint is slow (database writes, CRM syncs), the retries can stack up and create load on your infrastructure. The Twilio cost is minimal (webhook retries are not billed as call minutes), but the downstream cost — extra CRM API calls, duplicate database writes, and the engineering time to debug the resulting data inconsistencies — adds up. Teams that do not implement idempotency on their webhook handlers frequently discover they are processing 10–20% more webhook events than they have actual calls.
Carrier surcharges and regulatory fees. Twilio's advertised per-minute rates exclude Universal Service Fund (USF) contributions, state telecommunications taxes, E911 fees, and carrier recovery charges. These add 5–15% to your telephony bill depending on your volume and the jurisdictions you are calling. On a $500/month Twilio voice bill, that is $25–$75 in surcharges that do not appear in any per-call cost calculation.
How to audit your current spend: the API calls that reveal the truth
You cannot optimize what you have not measured. Here is how to pull your actual cost data from each provider and build a per-call cost model.
Twilio: Usage Records API. The endpoint that gives you aggregate cost data:
GET /2010-04-01/Accounts/{AccountSid}/Usage/Records.json?Category=calls&StartDate=2026-02-01&EndDate=2026-02-28
This returns total billed minutes, call count, and total price for the period. For per-call granularity, use the Call Resource:
GET /2010-04-01/Accounts/{AccountSid}/Calls.json?StartTime>=2026-02-01&StartTime<=2026-02-28&Status=completed
Each call record includes duration (actual seconds) and price (billed amount). Filter for calls where duration < 15 and price > 0 to find your micro-duration billed calls — these are your failed-call waste.
ElevenLabs: Usage endpoint. Pull your character consumption via the history endpoint:
GET https://api.elevenlabs.io/v1/history?page_size=100
Each history item includes character_count_change_from and character_count_change_to, giving you the exact character consumption per generation. Sum these to get your actual consumption and compare against your plan's included quota. If your daily average character consumption multiplied by remaining billing days exceeds your remaining quota, you are heading for overage pricing.
For conversational AI minutes specifically, check your usage dashboard or query the conversations endpoint to see per-session minute consumption.
Vapi: Call logs. Vapi exposes call-level cost data in its dashboard and API. Each call record includes the per-component cost breakdown: transport, STT, LLM, TTS, and platform fee. Export these to a spreadsheet and sort by total cost descending — the top 10% most expensive calls will reveal your cost outliers and point directly to the optimization opportunities.
Building the per-call cost model. Join these datasets by timestamp (plus or minus 2 seconds for Twilio-to-Vapi correlation, plus or minus 1 second for Vapi-to-ElevenLabs). For each call, you now have: Twilio billed amount + ElevenLabs character cost + LLM token cost + Vapi platform fee = true per-call cost. Calculate the median, the 90th percentile, and the 99th percentile. The gap between median and P99 is your optimization surface — the expensive tail calls are where most of the waste lives.
Optimization strategies that actually move the bill
Once you have per-call cost visibility, here are the highest-ROI optimizations in order of impact.
1. Control LLM verbosity at the prompt level. Add explicit length constraints to your system prompt: 'Respond in 1-2 sentences. Never exceed 120 characters per response.' Test the resulting response lengths — measure the average character count before and after the prompt change. A well-constrained prompt reduces average TTS cost by 40–60% with no degradation in call completion rate. This is consistently the single highest-ROI change because it reduces both TTS cost and call duration simultaneously.
2. Cache TTS for repeated responses. If 30–40% of your agent's responses are FAQ-type answers (greetings, business hours, hold messages, common objections), pre-generate the audio and serve it from cache instead of calling TTS in real time. This eliminates TTS cost entirely for cached responses and reduces latency to near-zero for those turns. Implementation: hash the response text, check a Redis or file cache for the hash, serve cached audio if it exists, generate and cache if it does not. ElevenLabs charges nothing for serving pre-generated audio — the cost is only on the initial generation.
3. Switch to a cheaper TTS model for non-critical turns. ElevenLabs Flash v2.5 costs 0.5 credits per character compared to 1 credit for the standard model. Use Flash for greetings, confirmations, and short procedural responses; reserve the higher-quality model for the critical persuasion or empathy moments of the call. This requires your orchestration layer to support model switching mid-call — Vapi supports this via the model parameter on the response object.
4. Implement call routing to reduce telephony cost. Twilio's BYOC (Bring Your Own Carrier) feature lets you route calls through a cheaper SIP trunk while keeping Twilio's call logic. Telnyx SIP trunking at $0.005/min versus Twilio's $0.014/min saves $0.009 per outbound minute — $450/month on 50,000 minutes. The tradeoff is operational complexity: you are now managing two vendor relationships for telephony.
5. Set up provider fallbacks to prevent retry waste. Configure your orchestration layer to fall back to a secondary TTS provider (e.g., Deepgram at $0.011/min) when the primary (ElevenLabs) latency exceeds 2 seconds, rather than retrying the same provider. This prevents double-billing on ElevenLabs and reduces call drop rate from latency-induced silence detection. The fallback voice will sound different, so configure it only for cases where delivering any audio is better than delivering silence.
6. Set Twilio usage triggers for budget alerts. Use the Twilio Usage Triggers API to fire a webhook when your account hits a spend threshold:
POST /2010-04-01/Accounts/{AccountSid}/Usage/Triggers.json with parameters UsageCategory=calls&TriggerValue=500&CallbackUrl=https://your-app.com/alerts/twilio-budget
This fires a webhook when your call spend hits $500, giving you a programmatic circuit-breaker before a runaway agent drains your budget overnight.
How Sherlock Calls surfaces cost anomalies across your entire stack
The audit process described above works, but it requires pulling data from 3–5 provider APIs, correlating by timestamp, and running the analysis on a regular cadence. Most teams do it once after a billing surprise and then never again — until the next surprise.
Sherlock Calls automates the cross-provider cost correlation continuously. Connect your Twilio, ElevenLabs, and Vapi accounts, and Sherlock builds the per-call cost model automatically. When a cost anomaly appears — a spike in micro-duration billed calls, a TTS retry loop burning through your ElevenLabs quota, or an agent whose average response length doubled after a prompt change — Sherlock posts a case file to Slack with the specific calls, the cost breakdown, and the probable root cause.
The weekly cost digest breaks down your spend by provider layer, flags the top 10 most expensive calls with per-component attribution, and calculates your effective cost per converted call (not just cost per call). Teams using the cost monitoring feature typically identify 15–30% in recoverable waste within the first billing cycle — waste that was previously invisible because it was spread across multiple provider dashboards that no one was cross-referencing.
The free tier includes cost anomaly detection across all connected providers. Connect your accounts at
usesherlock.ai to see your true per-call cost within 5 minutes of setup.
Real numbers: a before-and-after cost breakdown
Here is a real-world example of the cost impact from a team running 8,000 outbound AI voice calls per month through Vapi + Twilio + ElevenLabs + GPT-4.1.
Before optimization (monthly):
- Twilio voice: 8,000 calls x avg 2.4 min x $0.014/min = $269
- Twilio micro-duration waste: ~900 failed-but-billed calls x $0.014 = $13
- Twilio surcharges (8%): $23
- ElevenLabs TTS: 8,000 calls x avg 780 chars/call = 6.24M chars, Scale plan ($330) + $180 overage = $510
- ElevenLabs retry waste (~6%): $31
- Deepgram STT: 19,200 min x $0.006 = $115
- GPT-4.1 LLM: ~28M tokens x $0.06/M = $1.68 (negligible)
- Vapi platform: 19,200 min x $0.05 = $960
- Total: $1,923/month — effective $0.24 per call
After optimization:
- Prompt verbosity reduction (avg response: 780 to 210 chars): ElevenLabs drops to 1.68M chars/month, fits within Scale plan quota, no overage. Saves $180 in overages + $31 in retry waste. Also reduces avg call duration from 2.4 min to 1.9 min.
- TTS caching for greetings and FAQ responses (~35% of turns): further reduces ElevenLabs consumption by 25%. Plan downgrade from Scale to Pro ($99/month) now viable.
- Flash v2.5 for non-critical turns: 0.5x credit rate on 40% of remaining generations.
- Provider fallback on ElevenLabs latency > 2s: eliminates TTS retry double-billing entirely.
- Twilio BYOC via Telnyx for outbound: $0.005/min instead of $0.014/min.
After optimization (monthly):
- Telnyx voice: 8,000 calls x 1.9 min x $0.005/min = $76
- Telnyx surcharges (5%): $4
- ElevenLabs TTS: Pro plan $99 (within quota after caching + verbosity reduction)
- Deepgram STT: 15,200 min x $0.006 = $91
- GPT-4.1 LLM: ~20M tokens x $0.06/M = $1.20
- Vapi platform: 15,200 min x $0.05 = $760
- Total: $1,031/month — effective $0.13 per call
That is a 46% reduction in monthly spend — $892/month, $10,704/year — from optimizations that required no change to the AI agent's conversational logic or business outcomes. The call completion rate actually improved by 3% because reduced TTS latency from caching and Flash models meant fewer silence-detection drops.
The first step is always the same: know your true per-call cost across every provider. Everything else follows from that visibility.