Sotto
voice AI that answers UK restaurant phones.
.NET 10. Groq Llama 4 Scout. Whisper Large v3 Turbo. Deepgram Aura 2. Engineered to a sub-500 millisecond time-to-first-audio budget, with the UK Big 14 allergens enforced as a mandatory conversation state.
A UK restaurant voice AI, documented stage by stage.
The Sotto case study shows how a phone call placed to a UK restaurant becomes a confirmed POS order without a human taking the call. The architecture is a 29-project .NET 10 monorepo with Clean Architecture, 609 unit tests, and a published sub-500-millisecond time-to-first-audio budget.
The vertical is UK restaurants specifically. That choice determines the regulatory shape (Big 14 allergens enforced as a conversation state, VAT as integer pence, GDPR per-tenant retention, HMRC 7-year financial retention) and the integrations (Square UK, Toast, Clover, Stripe UK, Uber Direct, Stuart).
The Challenge
Restaurants miss calls and lose orders.
Phone orders still drive a meaningful share of restaurant revenue. During peak hours those calls go unanswered. Staff are stretched thin, language barriers frustrate customers, and every missed call is a lost order.
Missed calls during peak hours
Staff cannot answer every call when the kitchen is slammed. Callers hang up and order elsewhere.
Allergen liability on a busy line
Since Natasha's Law, allergen diligence is existential for UK food businesses. A rushed human can skip the question; a state machine cannot.
Front-of-house labour scarcity
Hiring and retaining phone-capable staff is harder than ever. Wages rise while margins shrink.
Latency budget
Sub-500 milliseconds, stage by stage.
Voice AI has zero tolerance for delay. A one-second pause feels like an eternity on a phone call. Sotto's budget is broken down per stage rather than quoted as a round-number marketing figure. Sum across all ten stages is 470 milliseconds.
| Stage | Kind | ms |
|---|---|---|
| Twilio PSTN ingress | Carrier | 50 |
| WebSocket ingress (VoiceGateway) | Internal | 5 |
| VAD plus buffer drain | Internal | 5 |
| Mu-law to PCM (8 to 16 kHz) | Internal | 5 |
| Groq Whisper Large v3 Turbo | Model | 150 |
| gRPC to orchestrator | Internal | 5 |
| pgvector RAG search | Internal | 15 |
| Groq Llama 4 Scout 17B TTFT | Model | 50 |
| Deepgram Aura 2 TTFA | Model | 130 |
| WebSocket egress and PSTN | Carrier | 55 |
| Sum | 470 |
Integration surface
Self-contained by design, POS-native by default.
Sotto is the deliberately self-contained product in the KaritKarma catalog: it sells standalone to UK restaurants, so it integrates the market's native stack directly and runs its own auth. Domain code focuses on voice and ordering.
Twilio
Voice and SMSMedia Streams WebSocket ingress for live call audio, plus a 2-way SMS conversation pipeline and SMS payment links through the same conversation engine.
Groq + Deepgram
AI pipelineWhisper Large v3 Turbo transcription and Llama 4 Scout reasoning on Groq; Deepgram Aura 2 speech with a streaming token bridge and barge-in support.
Square, Toast, Clover
POS write-backSquare UK, Toast (REST v2.5), and Clover (v3) connectors behind one common interface. The confirmed order lands on the POS the kitchen already runs.
Stripe UK
PaymentsCheckout sessions, payment links, and SMS payment links in integer pence, with Uber Direct and Stuart dispatchers for delivery. Dashboard auth is Sotto's own NextAuth plus JWT.
UK regulatory shape
Big 14, integer pence, GDPR on a schedule.
UK rules shape the conversation machine, the money math, and the retention policy. None of this is bolted on; each one is a first-class concern in the codebase.
UK Big 14 allergens, enforced
AllergenCheck is a mandatory state in the conversation machine, not a checkbox. The AI enumerates allergens per item and asks about caller allergies before any order can confirm.
VAT in integer pence
Standard 20 percent, Zero 0 percent, Reduced 5 percent. Stored as basis points and money values as integer pence so VAT never drifts a rounding penny.
GDPR on a daily schedule
Per-tenant retention (default 365 days). Daily 02:00 UTC purge. Right-to-erasure anonymises calls, customers, and transcripts in one transaction. HMRC 7-year financial retention is preserved.
Sotto vs the alternatives
Phone call. POS write. Done.
Versus a human server with a notepad, a touch-tone IVR, or a chatbot on the website, here is what the architecture does differently.
| Capability | Sotto | Human staff | Touch-tone IVR | Web chatbot |
|---|---|---|---|---|
| Sub-500ms first-audio budget | Variable | Web only | ||
| Available 24/7 | ||||
| Natural phone conversation | Text | |||
| Big 14 allergen enforcement | Mandatory state | Hopefully | ||
| Writes to your POS | Square / Toast / Clover | Manual | Limited | |
| Reads full menu accurately | pgvector RAG | Memory | Tone tree | |
| GDPR retention per tenant | Varies |
What ships today
Production-ready, in active UK pilots.
Answers every call, 24/7
The AI picks up around the clock, with caller spam protection (hourly and daily rate limits plus risk scoring) built in.
2-way SMS ordering
The same conversation engine answers text messages, and Stripe payment links are delivered by SMS.
Front-of-house freed for service
Staff focus on hospitality and table service instead of phones during peak.
Sub-500 millisecond voice budget
A published stage-by-stage engineering budget, made realistic by the streaming token bridge to Aura 2 with barge-in.
GDPR on a schedule
Per-tenant retention with a scheduled purge (02:00 UTC default, configurable) and a right-to-erasure anonymiser that keeps HMRC records intact.
Direct market-native integrations
Stripe UK, Square / Toast / Clover, Twilio, Groq, Deepgram, Uber Direct, Stuart. Auth is Sotto's own NextAuth plus JWT.
Frequently asked
Sotto, asked plainly.
- What is the Sotto case study?
- The Sotto case study documents how a UK-focused voice AI takes restaurant phone orders end-to-end. Sotto is built as a .NET 10 monorepo of 29 projects with Clean Architecture and 609 unit tests plus 29 Testcontainers integration tests. The voice path runs on Groq Llama 4 Scout 17B for reasoning, Whisper Large v3 Turbo for transcription, and Deepgram Aura 2 for speech, engineered to a sub-500 millisecond time-to-first-audio budget. The case study covers the latency budget, the eight-state conversation machine (with mandatory AllergenCheck for the UK Big 14), POS integrations (Square UK, Toast, Clover), payments (Stripe UK in integer pence), and per-tenant GDPR retention.
- Is Sotto live in production?
- Sotto is built and tested as a production-ready platform with 609 unit tests covering the voice pipeline, order flow, POS connectors, and payment links, all re-verified green at the date of this revision. It is in the UK restaurant pilot stage. We do not promote pilots to general-availability claims, so we label Sotto as production-ready in pilots rather than as a multi-customer SaaS roster. Named-customer references are added only with written permission.
- How is the sub-500 millisecond voice budget engineered?
- The figure is a published stage-by-stage engineering budget, not a measured production percentile, and we label it that way. The design that makes the budget realistic: a streaming token bridge batches LLM tokens at sentence boundaries and ships them to Deepgram Aura 2 TTS before the full Llama 4 Scout response is generated, hiding time-to-first-token behind time-to-first-audio, with TTS barge-in supported. Voice activity detection runs at a 50 RMS energy threshold with a 700 millisecond silence cutoff, and Whisper Large v3 Turbo on Groq is budgeted at 150 milliseconds for transcription, leaving headroom for the carrier legs.
- Does Sotto use the KaritKarma platform services?
- No, and we say so plainly: Sotto is the deliberately self-contained product in the catalog. As a UK restaurant platform sold standalone, it runs its own NextAuth plus JWT authentication in the merchant dashboard and integrates its market's native stack directly: Twilio Media Streams for voice and 2-way SMS, Groq for Whisper transcription and Llama 4 Scout reasoning, Deepgram Aura 2 for speech, Square UK, Toast, and Clover POS connectors behind a common connector interface, Stripe UK for checkout sessions, payment links, and SMS payment links, and Uber Direct plus Stuart for delivery dispatch. KaritKarma platform services are a fit where customers share our ecosystem; Sotto's UK buyers do not, so it does not pretend otherwise.
- Is Sotto compliant with UK food and data regulations?
- Yes. AllergenCheck is a mandatory state in the conversation machine, enforced in code so the only path to order review runs through it. VAT is calculated per item in basis points (Standard 20 percent, Zero 0 percent, Reduced 5 percent) and stored as integer pence, so totals never drift by a rounding penny. GDPR retention is per-tenant configurable (default 365 days, conversations purged after 90 days by default) with a scheduled purge at a configurable hour (02:00 UTC default) and a right-to-erasure anonymiser that preserves orders and payments for the HMRC 7-year retention rule.
- Where does Sotto run and what is the deployment model?
- Sotto deploys in a UK-region envelope for data-residency reasons. Telephony runs on Twilio Media Streams with webhook routing. The voice plane is five .NET 10 services in 4-layer clean architecture, with RabbitMQ 4 over MassTransit for messaging. The application database is PostgreSQL 18 with pgvector for menu RAG (bge-large-en-v1.5 embeddings over an HNSW index), Redis 8 handles ephemeral state, and the stack ships with OpenTelemetry, Jaeger, Prometheus, and Grafana wired in. Production routing is Traefik with Let's Encrypt TLS per host.
Explore Sotto
Voice AI that never misses a call.
See how Sotto answers, takes the order, enforces allergens, and writes to Square, Toast, or Clover before the caller hangs up.