Taxonomy first: which of the five suspects are you investigating?
A dropped call report is the voice AI equivalent of a crime with no witnesses — you know something went wrong, but the evidence is distributed across three or four systems that were not designed to speak to each other. Before any investigation begins, classify the failure type. The classification determines which evidence to gather first and prevents you from spending two hours investigating TTS latency when the problem is actually a CRM timeout.
Telephony layer timeouts present as calls that end at precisely the provider's configured silence or inactivity threshold. Check the call duration against your Twilio silence-timeout configuration: if calls are dropping at exactly second 5 or second 10, you have a telephony timeout, and the investigation now focuses on what was happening in the TTS layer at that moment. TTS generation failures leave a gap in the audio stream — no audio where audio was expected. These sometimes, but not always, appear as errors in voice AI provider logs.
AI orchestration timeouts occur when the underlying LLM takes longer than expected and Vapi, Retell, or your custom wrapper fails gracefully but silently — no outward error, but the conversation reaches a dead end. CRM write failures cause drops specifically when your agent is configured to require a successful CRM event before proceeding. Network routing issues produce intermittent, geographically-correlated failures that only become visible when you cut failure rates by caller region rather than in aggregate.
Stage 1 — Timestamp alignment across all providers
Once you have a hypothesis about which failure type you are investigating, the first stage is timestamp alignment. Pull the call ID from your telephony provider. Using that call ID, locate the corresponding events in every other system — TTS engine, CRM, AI agent logs — within a ±5 second window of the failure event. Some providers use ISO 8601 UTC, others use Unix milliseconds, others use local time in an unspecified timezone. Normalise everything to UTC milliseconds before comparison.
Map the event sequence chronologically and look for the first gap, delay, or anomaly in the expected flow. Expected flow for a voice AI call looks like: call_initiated → speech_detected → transcription_complete → LLM_response_generated → TTS_generation_start → TTS_generation_complete → audio_streamed → speech_detected (next turn). Any deviation from this sequence at any timestamp is a candidate for the originating failure point.
The gap between TTS_generation_start and TTS_generation_complete is almost always where ElevenLabs latency failures originate. A gap between LLM_response_generated and TTS_generation_start indicates an orchestration layer delay, not a TTS provider issue. A missing CRM event immediately after call_initiated — when the agent is configured to log the call start — points to a CRM integration failure. The timestamp map tells you which gap to investigate next.
Stage 2 — Provider isolation and hypothesis testing
Once you have a leading hypothesis from timestamp alignment, Stage 2 is provider isolation: reproduce the failure conditions in a single provider's test environment to confirm or refute the hypothesis. This stage is what most ad-hoc debugging skips — the immediate pressure to deploy a fix is higher than the patience for controlled hypothesis testing.
Skipping this step is expensive. A configuration change deployed to fix a symptom while the actual root cause continues operating produces a false-negative signal: the symptom disappears for a few days (or doesn't, if you got lucky with the provider's load conditions), the incident is closed, and the root cause continues producing silent failures at a lower rate. The next escalation — three weeks later — finds you debugging the same failure from scratch.
Provider isolation for a TTS latency hypothesis means: call the ElevenLabs API directly with the exact text inputs from the failed calls and measure generation time. If you reproduce latencies above 800ms, the hypothesis is confirmed and the fix is in the TTS configuration. If you cannot reproduce it, the issue is either load-dependent (meaning you need to test under concurrent conditions) or the hypothesis is wrong and Stage 1 needs re-examination.
Stage 3 — Configuration change and post-deployment verification
The configuration change required to fix a properly-isolated dropped-call root cause is almost always smaller than the investigation suggests it will be. Setting a response length cap in the agent system prompt. Adding an explicit ElevenLabs region parameter. Increasing a CRM write timeout from 3 to 8 seconds. Adjusting a Twilio silence-detection threshold from 5 to 8 seconds as a temporary measure while the TTS latency fix is validated.
The important step that most teams omit: post-deployment verification. Deploy the configuration change, wait 24 hours, and pull the same timestamp alignment data you used in Stage 1. Confirm that the gap you identified is no longer present in the post-deployment data. Confirm that the failure rate for the affected call type has returned to baseline. Then — and this is the step that prevents recurrence — document the root cause, the configuration change, and the verification evidence in a searchable incident log.
The incident log is the compounding return on your debugging investment. A team that resolves 50 dropped-call incidents per month and documents all of them has, within 90 days, an internal reference library covering the failure patterns that account for 85% of their incident volume. New incidents are matched against the library first, dramatically reducing investigation time. Teams that skip the documentation step investigate every incident type from scratch, forever.
The business case for framework versus ad-hoc debugging
The difference between ad-hoc debugging and a consistent framework is not just speed — though the time saving (3.5 hours to 47 minutes per incident) is significant and measurable. It is the quality of what you learn. Ad-hoc debugging ends when the immediate symptom disappears. Structured investigation ends when the root cause is documented and the failure mode is classified well enough that the next occurrence can be identified in under five minutes.
For a voice AI team handling 50 incidents per month — a typical figure for teams in production with meaningful call volume — the difference is approximately 130 hours of engineering time recovered monthly. That is three weeks of senior engineer capacity available for building rather than firefighting. And the compounding effect of the incident library means that time saving grows each month as more patterns are documented and the library becomes more useful.
The framework does not require new infrastructure. It requires three commitments: always classify before investigating, always align timestamps before hypothesising, and always document root cause before closing. Those three steps, applied consistently, are the difference between a team that fights the same fires repeatedly and one that methodically eliminates failure patterns from its production environment.
Frequently asked questions
What are the most common causes of dropped calls in voice AI systems?
The five primary causes are: telephony layer silence/inactivity timeouts (most common), TTS generation latency spikes exceeding the timeout threshold, LLM response timeouts in the voice AI orchestration layer, CRM write failures that break the call flow, and network routing issues producing geographically-correlated failures. In production deployments, telephony timeouts caused by TTS latency account for approximately 40% of unexplained call drops.
How do I find the Twilio call SID for a specific dropped call?
In Twilio Console, navigate to Monitor > Logs > Call Logs and filter by date range and 'completed' status (dropped calls often still show as completed). For programmable voice, the CallSid is included in the status callback payload. For Twilio Studio flows, you can find it in the Execution History. Once you have the CallSid, use the Twilio REST API to retrieve the full call resource including all child events.
How long should it take to debug a dropped call in production?
With a consistent framework and cross-provider tooling, diagnosing a dropped call to root cause should take under 15 minutes. Without cross-provider tooling, teams average 3–4 hours for the same incident. The difference is almost entirely the time spent manually downloading logs from each provider and aligning timestamps — a process that takes minutes when automated and hours when done by hand.
Ready to investigate your own calls?
Connect Sherlock to your voice providers in under 2 minutes. Free to start — 100 credits, no credit card.