Multi-provider AI voice agent and live supervision platform for a Tokyo-based used-vehicle exporter.
Sales calls were a bottleneck. A small team couldn't reach enough prospects, every call quality depended on which agent picked up, and there was no systematic way to learn from what went well or badly. They needed AI that could make outbound calls at scale, with human agents supervising in real time and a complete audit trail of every interaction.
Built a pluggable voice agent architecture where any voice model can be swapped in. Currently runs Gemini Live as one pipeline and a custom pipeline I built combining ElevenLabs TTS with separate transcription models. Users pick the model, the voice, and bring their own API keys — so when a better model launches, we can A/B test it in minutes instead of rewriting the system. On top of that, human agents can listen to live calls, see the streaming transcript, inject context from a sidebar to steer the AI mid-call, or take over entirely. Campaigns import contacts from Excel and launch outbound calls at scale through the company's SIP server.
The boundaries that mattered: keeping the teacher UI responsive while heavy AI work happens behind a WebSocket + microservice boundary.
Demo platform handling outbound calling, live supervision, and full audit trails for every interaction. Every call's audio, transcript, exact prompt used, supervisor actions, and QA reviews are stored — mistakes get flagged and fed back into prompt improvements. The provider-agnostic design means the platform stays competitive as the voice AI landscape shifts every few months.