Navigator uses AI to call companies, listen to their IVR phone menus in real-time, and navigate them automatically. You get a callback already connected to the right department.
“I need to cancel my internet subscription with Comcast”
Navigator replaces the entire IVR experience. Instead of listening to endless menus and pressing buttons, you tell us what you need and we handle the rest.
Enter the company phone number and describe what you need in plain language. "I need to dispute a charge on my credit card" is all it takes.
Navigator places the call, listens to every menu option in real-time using speech-to-text, and uses Claude AI to choose the right path through the menu tree.
Once the right department is reached, Navigator bridges you into the call via a conference room. You pick up already connected to a live agent.
Each navigation follows a deterministic pipeline powered by real-time audio processing, LLM-based menu understanding, and intelligent signal detection. Here's what happens inside every call.
Twilio streams live IVR audio via WebSocket. Deepgram Nova-3 performs real-time speech-to-text with keyword boosting optimized for IVR menus. Transcripts accumulate until a completion trigger fires: silence (8.5s), loop detection, or content threshold.
Accumulated transcript is sent to Claude Haiku with tool_use for structured output. The LLM extracts DTMF options (digit + label), detects voice-based IVRs, transfer endpoints, language gates, and input requests like account numbers or PINs.
Parse results are analyzed for actionable signals in priority order: language gates (send English digit), transfer endpoints (bridge immediately), voice-based menus (generate spoken response), input required (fail with refund), DTMF rejection (retry with delay).
For standard DTMF menus, extracted options + user intent go to Claude. The LLM returns which digit to press, a confidence score (0.0-1.0), and whether the target is reached. If not reached, the STT pipeline resets for the next menu level.
When the target department is reached, the IVR call moves into a Twilio Conference room. Navigator places an outbound call to your phone and adds you to the same conference. Both parties are connected — you're talking to a live agent.
After successful navigation, the discovered menu tree is saved to a two-tier cache (Redis 24h + PostgreSQL 30d). Each navigation's paths are cumulatively merged with existing cache, building a richer map for future calls.
DTMF menus receive digit presses. Voice-based IVRs receive a short spoken phrase generated by Claude and delivered via Twilio TTS.
Navigator handles real-world IVR complexity with pattern matching, LLM-based parsing, and a suite of detection algorithms refined against hundreds of real IVR transcriptions.
When User A navigates root → 1 → 3 (Billing → Cancel) and User B later navigates root → 2 → 1 (Support → Internet), the merged cache contains all discovered paths. User C asking to “check my balance” benefits from both previous navigations — the system already knows that Billing is option 1 and Balance is sub-option 1, so it sends ["1", "1"] instantly without waiting for audio.
Each navigation holds its own asyncio lock. Concurrent transcript processing can't corrupt session state.
DTMF rejections trigger retries with increasing delays (20s → 25s → 30s). Max 3 retries per path.
SQL UPDATE...WHERE credits > 0 RETURNING prevents race conditions. Auto-refund on failure or abort.
Overlapping Deepgram chunks are deduplicated using Jaccard similarity at 0.7 threshold before transcript assembly.
Navigator has been refined against hundreds of real IVR systems. Here's how it handles every scenario it encounters.
| Scenario | Detection | Action |
|---|---|---|
| Language gate | LLM parse + flag | Send English digit, reset pipeline |
| Voice-based IVR | LLM parse + 18 regex patterns | Generate short phrase, speak via TTS |
| Hybrid IVR | LLM parse (is_hybrid_ivr) | Use DTMF options, ignore voice |
| DTMF rejected | 6 regex patterns | Retry with progressive delay |
| Transfer detected | LLM + 16 regex patterns | Bridge to user immediately |
| Input required | LLM parse (SSN, account #) | Fail gracefully with refund |
| Menu loop | Jaccard similarity > 0.7 | Break loop, process transcript |
| Voicemail | 9 early trigger patterns | Detected as endpoint |
| Call failed | Twilio status webhook | Full cleanup, refund credit |
| Extended silence | 8.5s threshold | Trigger transcript processing |
┌─────────────┐ ┌──────────────────────────────────────────────────────────────┐
│ Client │ │ Navigator Backend │
│ (Next.js) │────▸│ FastAPI Router ──▸ Auth ──▸ Rate Limit ──▸ Credits Guard │
└─────────────┘ │ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Navigation Engine │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │ │
│ │ │ Twilio │──▸│ Deepgram │──▸│ Claude Haiku │ │ │
│ │ │ Voice │ │ Nova-3 │ │ (LLM Parser) │ │ │
│ │ │ Stream │ │ STT │ │ │ │ │
│ │ └──────────┘ └──────────┘ └───────┬───────┘ │ │
│ │ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────▼────────┐ │ │
│ │ │ Signal │◂──│ Navigate │◂──│ Menu Parse │ │ │
│ │ │ Detector │ │ Decision │ │ Result │ │ │
│ │ └────┬─────┘ └──────────┘ └───────────────┘ │ │
│ │ │ │ │
│ │ ▼ Bridge / DTMF / Voice / Retry │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │ │
│ ┌────▼────┐ ┌─▼──────────┐ │
│ │ Redis │ │ PostgreSQL │ │
│ │ Cache │ │ Persistent │ │
│ │ (24h) │ │ Cache (30d) │ │
│ └─────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────────┘Stop wasting time navigating phone menus. Let AI do it for you.