We Built a Voice AI That Books Appointments From Our Website — Here Is Exactly How

Tahir Sheikh
Founder & CEO, HyperScale Ai · March 19, 2026
We Built a Voice AI That Books Appointments From Our Website — Here Is Exactly How
Last Updated: March 19, 2026 | Author: Tahir Sheikh, Founder & CEO, HyperScale Ai Reading time: 8 minutes
Quick Answer
Aria is a voice AI agent on hyperscaleai.io that answers visitor questions, qualifies them as leads, and books demo appointments — entirely by voice, 24/7, with no human in the loop. She runs on xAI Grok 3 Fast for reasoning, Deepgram Nova-2 for speech-to-text, OpenAI TTS for voice output, and a pgvector knowledge base with 18 embedded documents about our product and pricing.
Introduction
Every service business has the same problem at 11 PM on a Tuesday: a potential client is on your website, ready to talk, and nobody is there to answer. By morning, they have moved on.
We built HyperScale Ai to solve operational problems like this one. But when it came to our own website, we realized we were making the same mistake everyone else does — relying on a contact form and hoping people would fill it out.
So we built Aria. She is a voice AI agent that lives on our homepage, answers questions about what we do, figures out if the visitor is a good fit, and books a call — all without a human touching anything.
This is the full technical story of how we did it, and what we learned along the way.
What Is a Voice AI Agent?
A voice AI agent is software that holds real conversations using speech — not a chatbot with text bubbles, and not an IVR phone tree. It listens to what you say (speech-to-text), reasons about the best response (large language model), and talks back to you (text-to-speech) in real time.
For example, when a visitor asks Aria "What do you charge?", she does not read a script. She retrieves our pricing from a knowledge base, contextualizes it based on the conversation so far, and explains the plans conversationally — then offers to book a call if the visitor sounds interested.
The Architecture Behind Aria
Building a voice AI that actually works in production required five integrated systems. Here is how they fit together.
The Processing Pipeline
Every conversation follows this path:
-
Speech-to-Text (Deepgram Nova-2): The visitor speaks into their microphone. The browser captures the audio and sends it to our STT endpoint, which calls Deepgram's Nova-2 model. Deepgram returns a transcript in under 300ms.
-
Knowledge Retrieval (pgvector RAG): Before calling the LLM, we embed the visitor's message using OpenAI's text-embedding-3-small model and run a cosine similarity search against our knowledge base in PostgreSQL with pgvector. The top 5 most relevant chunks are injected into the system prompt.
-
Reasoning (xAI Grok 3 Fast): The transcript, conversation history, knowledge context, and Aria's system prompt all go to xAI's Grok 3 Fast model via the OpenAI-compatible API. Grok reasons about the response and can also invoke tools — like
book_appointment— using function calling. -
Text-to-Speech (OpenAI TTS-1): Grok's text response is sent to OpenAI's TTS-1 API with the
shimmervoice. The audio stream is returned to the browser and played back. -
Session Management (Valkey/Redis): Every conversation is stored in Valkey with a 10-minute TTL. History is capped at 20 messages. Rate limits are enforced per IP (8 requests/minute for public visitors).
Why xAI Grok Instead of OpenAI GPT?
We tested GPT-4o, Claude, and Grok 3 Fast. Grok won on three counts: response latency (consistently under 800ms for short responses), function calling reliability (near-100% tool invocation accuracy), and cost (roughly 40% cheaper than GPT-4o at our volume). The OpenAI-compatible API also meant zero code changes when switching from GPT-4o.
The Knowledge Base That Makes Aria Useful
A voice AI without domain knowledge is just a fancy echo chamber. Aria's responses are grounded in 18 carefully written documents covering:
- Product overview — what HyperScale Ai is, who it is for, what it replaces
- Pricing — all four plans with team/client/project limits
- Objection handling — seven word-for-word scripts for common pushbacks
- Competitive positioning — how we compare to HubSpot, Monday.com, Dubsado, Salesforce
- Proof stories — scenario-based ROI examples with real numbers
- Booking protocol — the 4-step sequence Aria follows to get a demo booked
These documents are embedded using OpenAI's text-embedding-3-small (1536 dimensions) and stored in PostgreSQL with a pgvector ivfflat index. When a visitor asks a question, the RAG pipeline retrieves the most relevant chunks and injects them into the prompt — so Aria's answers are always grounded in our actual product data, not hallucinated.
The Booking Flow: From Conversation to Calendar
When Aria detects buying intent — the visitor asks about pricing, mentions a specific pain point, or says something like "Can we set up a call?" — she activates the book_appointment tool.
Step 1: Aria asks for the visitor's name and email (required fields).
Step 2: She optionally captures company name, phone, service interest, and preferred time.
Step 3: The tool fires a POST request to our /api/v1/bookings endpoint, which creates a booking record and sends a confirmation email via Resend.
Step 4: Aria confirms the booking conversationally and provides next steps.
The entire flow happens within the voice conversation. No form. No redirect. No friction.
Common Mistakes to Avoid When Building Voice AI
- Using a chatbot framework and calling it "voice AI": Text chatbots with TTS bolted on feel robotic. True voice AI needs purpose-built conversation management, not chat widget middleware.
- Skipping the knowledge base: Without RAG, the LLM will hallucinate product details. Every voice agent needs a curated, embedded knowledge base.
- Ignoring latency: If total round-trip (STT + LLM + TTS) exceeds 2 seconds, the conversation feels broken. Optimize every hop — use Grok Fast, not Grok, and Nova-2, not Whisper.
How HyperScale Ai Can Build This for You
Aria is not a one-off experiment — she is the template. We build voice AI agents for client websites using the same architecture: Deepgram STT, xAI Grok, OpenAI TTS, and a pgvector knowledge base customized to your business.
If you run a service business and your website visitors leave without talking to anyone, a voice AI agent changes that equation entirely.
Book a free Automation Audit →
Frequently Asked Questions
How much does it cost to build a voice AI agent like Aria?
Custom voice AI agent builds are project-based, typically scoped during a free Automation Audit. Factors include knowledge base size, number of tools (booking, CRM lookup, etc.), and integration complexity. Most single-purpose agents take 4-8 weeks to build.
Can Aria handle multiple languages?
Currently Aria operates in English (en-US). Multi-language support is on the roadmap — the architecture supports it since Deepgram Nova-2 handles 30+ languages and Grok responds in any language prompted.
What happens if Aria cannot answer a question?
Aria is trained to be transparent. If a question falls outside her knowledge base, she says so and offers to connect the visitor with a human or book a call to discuss further. She never fabricates answers.
Does the voice AI work on mobile devices?
Yes. The VoiceAgentWidget is responsive and works on iOS and Android browsers. On devices without microphone access, visitors can type instead — Aria switches to text mode automatically.
How is conversation data handled for privacy?
Conversations are stored in Valkey with a 10-minute TTL and are not persisted to disk. Audit logs record metadata (session ID, response time, tool calls) but not conversation content. No visitor data is shared with third parties.
Conclusion
Building Aria taught us that voice AI is not about the technology — it is about the experience. The visitor does not care that we use pgvector or Grok 3 Fast. They care that someone (something) answered their question at 11 PM and helped them book a call.
If your website's best lead capture mechanism is a contact form, you are leaving revenue on the table every night.
Book your free Automation Audit →
Related Reading:
- What We Build — Voice AI Agents Case Study
- HyperScale Ai CRM — AI-Native Platform
- Data Migration Services
HyperScale Ai is an AI-native agency management platform combining CRM, project management, client portals, payments, and Voice AI agents in one platform. Start your free trial →

Tahir Sheikh
Founder & CEO, HyperScale Ai
Builder of AI-native platforms and voice agents. Sharing what we learn as we build the system we wished existed when we ran our own agency.