AI-Powered IVR Navigation

Skip the phone menu.
Talk to a human.

Navigator uses AI to call companies, listen to their IVR phone menus in real-time, and navigate them automatically. You get a callback already connected to the right department.

Start Navigating See How It Works

navigator.pgdev.com.br

You say

“I need to cancel my internet subscription with Comcast”

+1 (800) 266-2278

Navigator does

Calls Comcast

Hears: "Press 1 for Billing..."

AI decides: Press 1, then 3

Bridges you to Cancellations

How It Works

From intent to connection in under 30 seconds

Navigator replaces the entire IVR experience. Instead of listening to endless menus and pressing buttons, you tell us what you need and we handle the rest.

You tell us who to call

Enter the company phone number and describe what you need in plain language. "I need to dispute a charge on my credit card" is all it takes.

AI navigates the IVR

Navigator places the call, listens to every menu option in real-time using speech-to-text, and uses Claude AI to choose the right path through the menu tree.

You get a callback

Once the right department is reached, Navigator bridges you into the call via a conference room. You pick up already connected to a live agent.

The Pipeline

6-phase navigation engine

Each navigation follows a deterministic pipeline powered by real-time audio processing, LLM-based menu understanding, and intelligent signal detection. Here's what happens inside every call.

Phase 1

Audio Capture (STT)

Twilio streams live IVR audio via WebSocket. Deepgram Nova-3 performs real-time speech-to-text with keyword boosting optimized for IVR menus. Transcripts accumulate until a completion trigger fires: silence (8.5s), loop detection, or content threshold.

Phase 2

Menu Parsing (LLM)

Accumulated transcript is sent to Claude Haiku with tool_use for structured output. The LLM extracts DTMF options (digit + label), detects voice-based IVRs, transfer endpoints, language gates, and input requests like account numbers or PINs.

Phase 3

Signal Detection

Parse results are analyzed for actionable signals in priority order: language gates (send English digit), transfer endpoints (bridge immediately), voice-based menus (generate spoken response), input required (fail with refund), DTMF rejection (retry with delay).

Phase 4

Navigation Decision

For standard DTMF menus, extracted options + user intent go to Claude. The LLM returns which digit to press, a confidence score (0.0-1.0), and whether the target is reached. If not reached, the STT pipeline resets for the next menu level.

Phase 5

Conference Bridge

When the target department is reached, the IVR call moves into a Twilio Conference room. Navigator places an outbound call to your phone and adds you to the same conference. Both parties are connected — you're talking to a live agent.

Phase 6

Cache & Learn

After successful navigation, the discovered menu tree is saved to a two-tier cache (Redis 24h + PostgreSQL 30d). Each navigation's paths are cumulatively merged with existing cache, building a richer map for future calls.

Per-level navigation loop

Twilio Audio

Deepgram STT

Claude Parse

Signal Detect

Claude Decide

Send DTMF / Voice

Loop resets STT pipeline for next menu level until target_reached = true

DTMF menus receive digit presses. Voice-based IVRs receive a short spoken phrase generated by Claude and delivered via Twilio TTS.

The Algorithm

Intelligent decisions at every level

Navigator handles real-world IVR complexity with pattern matching, LLM-based parsing, and a suite of detection algorithms refined against hundreds of real IVR transcriptions.

Transcript Completion Detection

Silence threshold: 8.5 seconds of no speech triggers processing
Loop detection: Jaccard similarity > 0.7 across 3+ consecutive segments detects IVR repeating itself
Early endpoint: Regex matches on "transferring", "voicemail", etc.
Content threshold: >30s elapsed + 5 segments + 5s delay
Safety timeout: 180 seconds per menu level maximum

LLM Menu Parsing

Claude Haiku with tool_use extracts structured DTMF options (digit + description)
Detects voice-based IVRs (expects spoken input, not button presses)
Identifies language gates ("press 1 for English")
Recognizes hybrid IVR systems (voice + DTMF fallback)
Handles edge cases: Deepgram transcribing "star" as "start", partial menus

Signal Detection Priority

1. Language gate → Send English digit, reset pipeline
2. Transfer endpoint → Bridge to user immediately
3. Voice-based menu → Generate 2-8 word phrase via Claude
4. Input required (SSN, account#) → Fail gracefully with refund
5. DTMF rejection → Retry with progressive delay (20s → 25s → 30s)

Fast Path (Cached Navigation)

Cached menu trees are walked level-by-level with LLM selecting the best option
If cache covers full path (target_reached=true), entire DTMF sequence sent at once
Reduces navigation from ~30s (live audio) to ~5-10s (cached)
Cumulative merge: each navigation enriches the cache for future calls
Two-tier cache: Redis hot (24h TTL) + PostgreSQL persistent (30d TTL)

Cumulative Cache Merge

When User A navigates root → 1 → 3 (Billing → Cancel) and User B later navigates root → 2 → 1 (Support → Internet), the merged cache contains all discovered paths. User C asking to “check my balance” benefits from both previous navigations — the system already knows that Billing is option 1 and Balance is sub-option 1, so it sends ["1", "1"] instantly without waiting for audio.

Reliability

Built for real-world edge cases

Per-Session Locks

Each navigation holds its own asyncio lock. Concurrent transcript processing can't corrupt session state.

Progressive Retry

DTMF rejections trigger retries with increasing delays (20s → 25s → 30s). Max 3 retries per path.

Atomic Credits

SQL UPDATE...WHERE credits > 0 RETURNING prevents race conditions. Auto-refund on failure or abort.

Jaccard Dedup

Overlapping Deepgram chunks are deduplicated using Jaccard similarity at 0.7 threshold before transcript assembly.

Edge Cases

Every IVR scenario handled

Navigator has been refined against hundreds of real IVR systems. Here's how it handles every scenario it encounters.

Scenario	Detection	Action
Language gate	LLM parse + flag	Send English digit, reset pipeline
Voice-based IVR	LLM parse + 18 regex patterns	Generate short phrase, speak via TTS
Hybrid IVR	LLM parse (is_hybrid_ivr)	Use DTMF options, ignore voice
DTMF rejected	6 regex patterns	Retry with progressive delay
Transfer detected	LLM + 16 regex patterns	Bridge to user immediately
Input required	LLM parse (SSN, account #)	Fail gracefully with refund
Menu loop	Jaccard similarity > 0.7	Break loop, process transcript
Voicemail	9 early trigger patterns	Detected as endpoint
Call failed	Twilio status webhook	Full cleanup, refund credit
Extended silence	8.5s threshold	Trigger transcript processing

Tech Stack

Production-grade infrastructure

Backend

Python 3.12FastAPI (async)SQLAlchemy 2.0PostgreSQL 16Redis 7Alembic

AI / ML

Claude Haiku 4.5Deepgram Nova-3tool_use structured outputKeyword boostingJaccard similarity

Telephony

Twilio Voice APIWebSocket Media StreamsDTMF tone sendingConference bridgingSMS OTP auth

Frontend

Next.js 16Tailwind CSS 4shadcn/uiWebSocket real-timeGeist font family

Architecture

System overview

┌─────────────┐     ┌──────────────────────────────────────────────────────────────┐
│   Client     │     │                    Navigator Backend                         │
│   (Next.js)  │────▸│  FastAPI Router ──▸ Auth ──▸ Rate Limit ──▸ Credits Guard   │
└─────────────┘     │         │                                                     │
                    │         ▼                                                     │
                    │  ┌─────────────────────────────────────────────────────┐      │
                    │  │              Navigation Engine                      │      │
                    │  │                                                     │      │
                    │  │  ┌──────────┐   ┌──────────┐   ┌───────────────┐  │      │
                    │  │  │ Twilio   │──▸│ Deepgram │──▸│ Claude Haiku  │  │      │
                    │  │  │ Voice    │   │ Nova-3   │   │ (LLM Parser)  │  │      │
                    │  │  │ Stream   │   │ STT      │   │               │  │      │
                    │  │  └──────────┘   └──────────┘   └───────┬───────┘  │      │
                    │  │                                        │          │      │
                    │  │  ┌──────────┐   ┌──────────┐   ┌──────▼────────┐ │      │
                    │  │  │ Signal   │◂──│ Navigate │◂──│ Menu Parse    │ │      │
                    │  │  │ Detector │   │ Decision │   │ Result        │ │      │
                    │  │  └────┬─────┘   └──────────┘   └───────────────┘ │      │
                    │  │       │                                           │      │
                    │  │       ▼  Bridge / DTMF / Voice / Retry           │      │
                    │  └─────────────────────────────────────────────────────┘      │
                    │         │         │                                            │
                    │    ┌────▼────┐  ┌─▼──────────┐                               │
                    │    │ Redis   │  │ PostgreSQL  │                               │
                    │    │ Cache   │  │ Persistent  │                               │
                    │    │ (24h)   │  │ Cache (30d) │                               │
                    │    └─────────┘  └─────────────┘                               │
                    └──────────────────────────────────────────────────────────────┘

Ready to skip the wait?

Stop wasting time navigating phone menus. Let AI do it for you.

Get Started Free

Skip the phone menu.Talk to a human.