ShippedMar 2026 — PresentAI Voice Agent Developer & Full-Stack Engineer

Nobuko Japan

Multi-provider AI voice agent and live supervision platform for a Tokyo-based used-vehicle exporter.

Voice pipelines

Gemini Live + custom ElevenLabs + STT pipeline

Engineer

owning AI, automation, frontend & backend

100%

Audit coverage

audio, transcript, prompt, supervisor actions

Export markets

UK, Ireland, Cyprus, Pakistan

The problem

Sales calls were a bottleneck. A small team couldn't reach enough prospects, every call quality depended on which agent picked up, and there was no systematic way to learn from what went well or badly. They needed AI that could make outbound calls at scale, with human agents supervising in real time and a complete audit trail of every interaction.

The approach

Built a pluggable voice agent architecture where any voice model can be swapped in. Currently runs Gemini Live as one pipeline and a custom pipeline I built combining ElevenLabs TTS with separate transcription models. Users pick the model, the voice, and bring their own API keys — so when a better model launches, we can A/B test it in minutes instead of rewriting the system. On top of that, human agents can listen to live calls, see the streaming transcript, inject context from a sidebar to steer the AI mid-call, or take over entirely. Campaigns import contacts from Excel and launch outbound calls at scale through the company's SIP server.

Architecture

How the system is wired.

The boundaries that mattered: keeping the teacher UI responsive while heavy AI work happens behind a WebSocket + microservice boundary.

Client

Service

Data

External

Hover to focus a service

The outcome

Demo platform handling outbound calling, live supervision, and full audit trails for every interaction. Every call's audio, transcript, exact prompt used, supervisor actions, and QA reviews are stored — mistakes get flagged and fed back into prompt improvements. The provider-agnostic design means the platform stays competitive as the voice AI landscape shifts every few months.

What I owned

01Designed a pluggable multi-provider voice agent — any voice model can be swapped in; users pick model, voice, and bring their own API keys.
02Built a custom voice pipeline combining ElevenLabs TTS with separate transcription models, alongside the Gemini Live pipeline.
03Implemented live agent supervision — human agents hear AI calls in real time, see live transcripts, inject context to steer the AI mid-call, or take over.
04Built the campaign system — sales agents import contacts from Excel, build outbound campaigns, and launch via the company's SIP server.
05Designed the audit-trail data model so every call is reproducible: audio, transcript, exact prompt, supervisor actions, and QA review.

Stack

ReactNode.jsExpressMongoDBWebSocketsSIPGemini LiveElevenLabsSTT / TTSPrompt engineering

Tututor.ai

Insight-X