CityVoice is a German-language voice assistant for municipal services. Citizens click “Anrufen” in their browser, ask questions about city services in natural German, and get accurate spoken answers in real-time. The system combines ElevenLabs Conversational AI with a RAG knowledge base of municipal data — opening hours, service locations, required documents, and procedures.
The architecture is fully event-driven: WebRTC handles the voice stream, ElevenLabs runs ASR (Whisper) → LLM (GPT-4o) → TTS in sequence, and when the LLM needs factual answers it calls FastAPI webhook endpoints that query ChromaDB. The entire round-trip from question to spoken answer happens in under 2 seconds.
Technical Highlights
- Real-time WebRTC voice calls directly in the browser
- German-language ASR, LLM reasoning, and TTS in a single pipeline
- RAG-powered knowledge base via ChromaDB for accurate municipal data
- FastAPI webhook endpoints as LLM tools for structured data retrieval
- Opening hours lookup and service information with source attribution
- Next.js frontend with ElevenLabs client SDK integration
Stack
- Voice: ElevenLabs Conversational AI (Whisper + GPT-4o + TTS)
- Backend: FastAPI, Python
- Frontend: Next.js, ElevenLabs SDK
- Knowledge: ChromaDB vector database
- Protocol: WebRTC


