We Built a Voice AI That Books Appointments From Our Website — Here Is Exactly How

Last Updated: March 19, 2026 | Author: Tahir Sheikh, Founder & CEO, HyperScale Ai Reading time: 8 minutes

Quick Answer

Aria is a voice AI agent on hyperscaleai.io that answers visitor questions, qualifies them as leads, and books demo appointments — entirely by voice, 24/7, with no human in the loop. She runs on xAI Grok 3 Fast for reasoning, Deepgram Nova-2 for speech-to-text, OpenAI TTS for voice output, and a pgvector knowledge base with 18 embedded documents about our product and pricing.

Introduction

Every service business has the same problem at 11 PM on a Tuesday: a potential client is on your website, ready to talk, and nobody is there to answer. By morning, they have moved on.

We built HyperScale Ai to solve operational problems like this one. But when it came to our own website, we realized we were making the same mistake everyone else does — relying on a contact form and hoping people would fill it out.

So we built Aria. She is a voice AI agent that lives on our homepage, answers questions about what we do, figures out if the visitor is a good fit, and books a call — all without a human touching anything.

This is the full technical story of how we did it, and what we learned along the way.

What Is a Voice AI Agent?

A voice AI agent is software that holds real conversations using speech — not a chatbot with text bubbles, and not an IVR phone tree. It listens to what you say (speech-to-text), reasons about the best response (large language model), and talks back to you (text-to-speech) in real time.

For example, when a visitor asks Aria "What do you charge?", she does not read a script. She retrieves our pricing from a knowledge base, contextualizes it based on the conversation so far, and explains the plans conversationally — then offers to book a call if the visitor sounds interested.

The Architecture Behind Aria

Building a voice AI that actually works in production required five integrated systems. Here is how they fit together.

The Processing Pipeline

Every conversation follows this path:

Speech-to-Text (Deepgram Nova-2): The visitor speaks into their microphone. The browser captures the audio and sends it to our STT endpoint, which calls Deepgram's Nova-2 model. Deepgram returns a transcript in under 300ms.
Knowledge Retrieval (pgvector RAG): Before calling the LLM, we embed the visitor's message using OpenAI's text-embedding-3-small model and run a cosine similarity search against our knowledge base in PostgreSQL with pgvector. The top 5 most relevant chunks are injected into the system prompt.
Reasoning (xAI Grok 3 Fast): The transcript, conversation history, knowledge context, and Aria's system prompt all go to xAI's Grok 3 Fast model via the OpenAI-compatible API. Grok reasons about the response and can also invoke tools — like book_appointment — using function calling.
Text-to-Speech (OpenAI TTS-1): Grok's text response is sent to OpenAI's TTS-1 API with the shimmer voice. The audio stream is returned to the browser and played back.
Session Management (Valkey/Redis): Every conversation is stored in Valkey with a 10-minute TTL. History is capped at 20 messages. Rate limits are enforced per IP (8 requests/minute for public visitors).

Why xAI Grok Instead of OpenAI GPT?

We tested GPT-4o, Claude, and Grok 3 Fast. Grok won on three counts: response latency (consistently under 800ms for short responses), function calling reliability (near-100% tool invocation accuracy), and cost (roughly 40% cheaper than GPT-4o at our volume). The OpenAI-compatible API also meant zero code changes when switching from GPT-4o.

The Knowledge Base That Makes Aria Useful

A voice AI without domain knowledge is just a fancy echo chamber. Aria's responses are grounded in 18 carefully written documents covering:

Product overview — what HyperScale Ai is, who it is for, what it replaces
Pricing — all four plans with team/client/project limits
Objection handling — seven word-for-word scripts for common pushbacks
Competitive positioning — how we compare to HubSpot, Monday.com, Dubsado, Salesforce
Proof stories — scenario-based ROI examples with real numbers
Booking protocol — the 4-step sequence Aria follows to get a demo booked

These documents are embedded using OpenAI's text-embedding-3-small (1536 dimensions) and stored in PostgreSQL with a pgvector ivfflat index. When a visitor asks a question, the RAG pipeline retrieves the most relevant chunks and injects them into the prompt — so Aria's answers are always grounded in our actual product data, not hallucinated.

The Booking Flow: From Conversation to Calendar

When Aria detects buying intent — the visitor asks about pricing, mentions a specific pain point, or says something like "Can we set up a call?" — she activates the book_appointment tool.

Step 1: Aria asks for the visitor's name and email (required fields).

Step 2: She optionally captures company name, phone, service interest, and preferred time.

Step 3: The tool fires a POST request to our /api/v1/bookings endpoint, which creates a booking record and sends a confirmation email via Resend.

Step 4: Aria confirms the booking conversationally and provides next steps.

The entire flow happens within the voice conversation. No form. No redirect. No friction.

Common Mistakes to Avoid When Building Voice AI

Using a chatbot framework and calling it "voice AI": Text chatbots with TTS bolted on feel robotic. True voice AI needs purpose-built conversation management, not chat widget middleware.
Skipping the knowledge base: Without RAG, the LLM will hallucinate product details. Every voice agent needs a curated, embedded knowledge base.
Ignoring latency: If total round-trip (STT + LLM + TTS) exceeds 2 seconds, the conversation feels broken. Optimize every hop — use Grok Fast, not Grok, and Nova-2, not Whisper.

How HyperScale Ai Can Build This for You

Aria is not a one-off experiment — she is the template. We build voice AI agents for client websites using the same architecture: Deepgram STT, xAI Grok, OpenAI TTS, and a pgvector knowledge base customized to your business.

If you run a service business and your website visitors leave without talking to anyone, a voice AI agent changes that equation entirely.

Book a free Automation Audit →

Frequently Asked Questions

How much does it cost to build a voice AI agent like Aria?

Custom voice AI agent builds are project-based, typically scoped during a free Automation Audit. Factors include knowledge base size, number of tools (booking, CRM lookup, etc.), and integration complexity. Most single-purpose agents take 4-8 weeks to build.

Can Aria handle multiple languages?

Currently Aria operates in English (en-US). Multi-language support is on the roadmap — the architecture supports it since Deepgram Nova-2 handles 30+ languages and Grok responds in any language prompted.

What happens if Aria cannot answer a question?

Aria is trained to be transparent. If a question falls outside her knowledge base, she says so and offers to connect the visitor with a human or book a call to discuss further. She never fabricates answers.

Does the voice AI work on mobile devices?

Yes. The VoiceAgentWidget is responsive and works on iOS and Android browsers. On devices without microphone access, visitors can type instead — Aria switches to text mode automatically.

How is conversation data handled for privacy?

Conversations are stored in Valkey with a 10-minute TTL and are not persisted to disk. Audit logs record metadata (session ID, response time, tool calls) but not conversation content. No visitor data is shared with third parties.

Conclusion

Building Aria taught us that voice AI is not about the technology — it is about the experience. The visitor does not care that we use pgvector or Grok 3 Fast. They care that someone (something) answered their question at 11 PM and helped them book a call.

If your website's best lead capture mechanism is a contact form, you are leaving revenue on the table every night.

Book your free Automation Audit →

Related Reading:

HyperScale Ai is an AI-native agency management platform combining CRM, project management, client portals, payments, and Voice AI agents in one platform. Start your free trial →

We Built a Voice AI That Books Appointments From Our Website — Here Is Exactly How

We Built a Voice AI That Books Appointments From Our Website — Here Is Exactly How

Quick Answer

Introduction

What Is a Voice AI Agent?

The Architecture Behind Aria

The Processing Pipeline

Why xAI Grok Instead of OpenAI GPT?

The Knowledge Base That Makes Aria Useful

The Booking Flow: From Conversation to Calendar

Common Mistakes to Avoid When Building Voice AI

How HyperScale Ai Can Build This for You

Frequently Asked Questions

How much does it cost to build a voice AI agent like Aria?

Can Aria handle multiple languages?

What happens if Aria cannot answer a question?

Does the voice AI work on mobile devices?

How is conversation data handled for privacy?

Conclusion

Related Articles

What Happens When Your CRM's AI Can Query Its Own Database: A Guide to Live Data Tools in AI-Native Software (2026)

We Ingested 69 Documents Into Our CRM's AI Knowledge Base — Here's What We Learned About RAG for Business (2026)

AI-Native vs AI-Powered: The Real Difference (2026)

Ready to automate your operations?