Skip to the content.

Mojiokoshi is a self-hosted audio transcription tool with a real-time progress UI, powered by faster-whisper and NVIDIA CUDA. It runs entirely on your own machine — your audio never leaves it.

View on GitHub

Highlights

Quick Start

Requires Docker and, for GPU acceleration, the NVIDIA Container Toolkit.

git clone https://github.com/P4suta/mojiokoshi.git
cd mojiokoshi
docker compose up

Then open http://localhost:8000 in your browser. The first run downloads the Whisper model (~3 GB for large-v3) into a persistent volume, so later starts are fast. No host-side Python or Node is needed — everything runs in containers.

At a glance

   
Models tiny · base · small · medium · large-v3 (default)
Languages Japanese (default) · English · Chinese · Korean · auto-detect
Audio formats MP3 · WAV · M4A · OGG · FLAC · WebM · WMA · AAC (≤ 500 MB)
Backend Python 3.12 · FastAPI · faster-whisper · uvicorn
Frontend Svelte 5 · SvelteKit · Tailwind CSS 4 · Vite 8
Container Docker multi-stage build on NVIDIA CUDA 12.9.1

Learn more